machine learning neural networks human brain neurons
Post on 21-Dec-2015
225 views
TRANSCRIPT
![Page 1: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/1.jpg)
Machine Learning
Neural Networks
![Page 2: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/2.jpg)
Human Brain
![Page 3: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/3.jpg)
Neurons
![Page 4: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/4.jpg)
Input-Output Transformation
Input Spikes
Output Spike
(Excitatory Post-Synaptic Potential)
Spike (= a brief pulse)
![Page 5: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/5.jpg)
Human Learning
• Number of neurons: ~ 1010
• Connections per neuron: ~ 104 to 105
• Neuron switching time: ~ 0.001 second• Scene recognition time: ~ 0.1 second
100 inference steps doesn’t seem much
![Page 6: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/6.jpg)
Machine Learning Abstraction
Synapses
Axon
Dendrites
Synapses++
+--
(weights)
Nodes
![Page 7: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/7.jpg)
Artificial Neural Networks
• Typically, machine learning ANNs are very artificial, ignoring:– Time– Space– Biological learning processes
• More realistic neural models exist– Hodkins & Huxley (1952) won a Nobel
prize for theirs (in 1963)• Nonetheless, very artificial ANNs have
been useful in many ML applications
![Page 8: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/8.jpg)
Perceptrons
• The “first wave” in neural networks• Big in the 1960’s
– McCulloch & Pitts (1943), Woodrow & Hoff (1960), Rosenblatt (1962)
![Page 9: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/9.jpg)
A single perceptron
else 0
0 if 10
n
iii xw
w1
w3
w2
w4
w5
x1
x2
x3
x4
x5
x0
w0
Inpu
ts
Bias (x0 =1,always)
![Page 10: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/10.jpg)
Logical Operators
-0.8
0.5
0.5
else 0
0 if 10
n
iii xw
x0
x1
x2
AND
-0.3
0.5
0.5
else 0
0 if 10
n
iii xw
x0
x1
x2
OR
0.0
-1.0
else 0
0 if 10
n
iii xw
x0
x1
NOT
![Page 11: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/11.jpg)
Perceptron Hypothesis Space
• What’s the hypothesis space for a perceptron on n inputs?
1| nwwH
![Page 12: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/12.jpg)
Learning Weights
• Perceptron Training Rule• Gradient Descent• (other approaches: Genetic
Algorithms)
else 0
0 if 10
n
iii xw
?x0
?x1
?x2
![Page 13: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/13.jpg)
Perceptron Training Rule
• Weights modified for each example • Update Rule:
iii www
ii xotw )( where
learning rate
targetvalue
perceptronoutput
inputvalue
![Page 14: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/14.jpg)
What weights make XOR?
• No combination of weights works• Perceptrons can only represent
linearly separable functions
else 0
0 if 10
n
iii xw
?
x0
?
x1
?x2
![Page 15: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/15.jpg)
Linear Separability
x1
x2
OR
![Page 16: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/16.jpg)
Linear Separability
x1
x2
AND
![Page 17: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/17.jpg)
Linear Separability
x1
x2
XOR
![Page 18: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/18.jpg)
Perceptron Training Rule
• Converges to the correct classification IF– Cases are linearly separable– Learning rate is slow enough– Proved by Minsky and Papert in 1969
Killed widespread interest in perceptrons till the 80’s
![Page 19: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/19.jpg)
XOR
else 0
0 if 10
n
iii xw
0
x0
0.6x1
0.6x2
else 0
0 if 10
n
iii xw
0
x0
else 0
0 if 10
n
iii xw
0
x0
XOR1
1
-0.6
-0.6
![Page 20: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/20.jpg)
What’s wrong with perceptrons?
• You can always plug multiple perceptrons together to calculate any function.
• BUT…who decides what the weights are?– Assignment of error to parental inputs
becomes a problem….– This is because of the threshold….
• Who contributed the error?
![Page 21: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/21.jpg)
Problem: Assignment of error
else 0
0 if 10
n
iii xw
?
x0
?
x1
?x2
Perceptron ThresholdStep function
• Hard to tell from the output who contributed what
• Stymies multi-layer weight learning
![Page 22: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/22.jpg)
Solution: Differentiable Function
n
iii xw
0
?
x0
?
x1
?x2
Simple linear function
• Varying any input a little creates a perceptible change in the output
• This lets us propagate error to prior nodes
![Page 23: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/23.jpg)
Measuring error for linear units
• Output Function
• Error Measure:
xwx
)(
Dd
dd otwE 2)(2
1)(
data targetvalue
linear unit output
![Page 24: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/24.jpg)
Gradient Descent
Gradient:Training rule:
![Page 25: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/25.jpg)
Gradient Descent Rule
Dddd
ii
otww
E 2)(2
1
Dd
didd xot ))(( ,
Dd
diddii xotww ,)(Update Rule:
![Page 26: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/26.jpg)
Back to XOR
else 0
0 if 10
n
iii xw
x0
x1
x2
else 0
0 if 10
n
iii xw
x0
else 0
0 if 10
n
iii xw
x0
XOR
![Page 27: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/27.jpg)
Gradient Descent for Multiple Layers
x0
x1
x2
x0
x0
XOR
n
iii xw
0
n
iii xw
0
n
iii xw
0
ijw
We can compute:
ijw
E
![Page 28: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/28.jpg)
Gradient Descent vs. Perceptrons
• Perceptron Rule & Threshold Units– Learner converges on an answer ONLY IF
data is linearly separable– Can’t assign proper error to parent nodes
• Gradient Descent– Minimizes error even if examples are not
linearly separable– Works for multi-layer networks
• But…linear units only make linear decision surfaces (can’t learn XOR even with many layers)
– And the step function isn’t differentiable…
![Page 29: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/29.jpg)
A compromise function• Perceptron
• Linear
• Sigmoid (Logistic)
else 0
0 if 10
n
iii xw
output
n
iii xwnetoutput
0
netenetoutput
1
1)(
![Page 30: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/30.jpg)
The sigmoid (logistic) unit
• Has differentiable function– Allows gradient descent
• Can be used to learn non-linear functions
?
x1
?x2
n
iii xw
e 01
1
![Page 31: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/31.jpg)
Logistic function
Inputs
Coefficients
Output
Independent variables
Prediction
Age 34
1Gender
Stage 4
.5
.8
.40.6
“Probability of beingAlive”
n
iii xw
e 01
1
![Page 32: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/32.jpg)
Neural Network Model
Inputs
Weights
Output
Independent variables
Dependent variable
Prediction
Age 34
2Gender
Stage 4
.6
.5
.8
.2
.1
.3.7
.2
WeightsHiddenLayer
“Probability of beingAlive”
0.6
.4
.2
![Page 33: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/33.jpg)
Getting an answer from a NN
Inputs
Weights
Output
Independent variables
Dependent variable
Prediction
Age 34
2Gender
Stage 4
.6
.5
.8
.1
.7
WeightsHiddenLayer
“Probability of beingAlive”
0.6
![Page 34: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/34.jpg)
Inputs
Weights
Output
Independent variables
Dependent variable
Prediction
Age 34
2Gender
Stage 4
.5
.8.2
.3
.2
WeightsHiddenLayer
“Probability of beingAlive”
0.6
Getting an answer from a NN
![Page 35: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/35.jpg)
Getting an answer from a NN
Inputs
Weights
Output
Independent variables
Dependent variable
Prediction
Age 34
1Gender
Stage 4
.6.5
.8.2
.1
.3.7
.2
WeightsHiddenLayer
“Probability of beingAlive”
0.6
![Page 36: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/36.jpg)
Minimizing the Error
winitialwtrained
initial error
final error
Error surface
positive change
negative derivative
local minimum
![Page 37: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/37.jpg)
Differentiability is key!
• Sigmoid is easy to differentiate
• For gradient descent on multiple layers, a little dynamic programming can help:– Compute errors at each output node– Use these to compute errors at each hidden node– Use these to compute errors at each input node
))(1()()(
yyy
y
![Page 38: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/38.jpg)
The Backpropagation Algorithm
jikjiji
ji
koutputsk
hkhhh
h
kkkkk
k
u
xδww
w
δwooδ
δ
otooδ
δk
u
ox
t,x
ight network weeach Update.4
)1(
error term its calculate h,unit hidden each For .3
))(1(
error term its calculate ,unit output each For 2.
network in the unit every for
output thecompute andnetwork the to instanceInput 1.
example, ninginput traieach For
![Page 39: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/39.jpg)
Getting an answer from a NN
Inputs
Weights
Output
Independent variables
Dependent variable
Prediction
Age 34
1Gender
Stage 4
.6.5
.8.2
.1
.3.7
.2
WeightsHiddenLayer
“Probability of beingAlive”
0.6
![Page 40: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/40.jpg)
Expressive Power of ANNs
• Universal Function Approximator:– Given enough hidden units, can
approximate any continuous function f
• Need 2+ hidden units to learn XOR
• Why not use millions of hidden units?– Efficiency (neural network training is
slow)– Overfitting
![Page 41: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/41.jpg)
Overfitting
Overfitted ModelReal Distribution
![Page 42: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/42.jpg)
Combating Overfitting in Neural Nets
• Many techniques
• Two popular ones:– Early Stopping
• Use “a lot” of hidden units• Just don’t over-train
– Cross-validation• Choose the “right” number of hidden units
![Page 43: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/43.jpg)
Early Stopping
b = training set
a = validation set
Overfitted model
error
Epochs
minerror )
error a
error b
Stopping criterion
![Page 44: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/44.jpg)
Cross-validation
• Cross-validation: general-purpose technique for model selection – E.g., “how many hidden units should I use?”
• More extensive version of validation-set approach.
![Page 45: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/45.jpg)
Cross-validation
• Break training set into k sets• For each model M
– For i=1…k•Train M on all but set i•Test on set i
• Output M with highest average test score, trained on full training set
![Page 46: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/46.jpg)
Summary of Neural Networks
Non-linear regression technique that is trained with gradient descent.
Question: How important is the biological metaphor?
![Page 47: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/47.jpg)
Summary of Neural Networks
When are Neural Networks useful?– Instances represented by attribute-value pairs
• Particularly when attributes are real valued
– The target function is• Discrete-valued• Real-valued• Vector-valued
– Training examples may contain errors– Fast evaluation times are necessary
When not?– Fast training times are necessary– Understandability of the function is required
![Page 48: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/48.jpg)
Advanced Topics in Neural Nets
• Batch Move vs. incremental• Hidden Layer Representations• Hopfield Nets• Neural Networks on Silicon• Neural Network language models
![Page 49: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/49.jpg)
Incremental vs. Batch Mode
![Page 50: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/50.jpg)
Incremental vs. Batch Mode
• In Batch Mode we minimize:
• Same as computing:
• Then setting
Dd
dD ww
Dwww
![Page 51: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/51.jpg)
Advanced Topics in Neural Nets
• Batch Move vs. incremental• Hidden Layer Representations• Hopfield Nets• Neural Networks on Silicon• Neural Network language models
![Page 52: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/52.jpg)
Hidden Layer Representations
• Input->Hidden Layer mapping:– representation of input vectors tailored to
the task
• Can also be exploited for dimensionality reduction– Form of unsupervised learning in which we
output a “more compact” representation of input vectors
– <x1, …,xn> -> <x’1, …,x’m> where m < n
– Useful for visualization, problem simplification, data compression, etc.
![Page 53: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/53.jpg)
Dimensionality Reduction
Model: Function to learn:
![Page 54: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/54.jpg)
Dimensionality Reduction: Example
![Page 55: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/55.jpg)
Dimensionality Reduction: Example
![Page 56: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/56.jpg)
Dimensionality Reduction: Example
![Page 57: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/57.jpg)
Dimensionality Reduction: Example
![Page 58: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/58.jpg)
Advanced Topics in Neural Nets
• Batch Move vs. incremental• Hidden Layer Representations• Hopfield Nets• Neural Networks on Silicon• Neural Network language models
![Page 59: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/59.jpg)
Advanced Topics in Neural Nets
• Batch Move vs. incremental• Hidden Layer Representations• Hopfield Nets• Neural Networks on Silicon• Neural Network language models
![Page 60: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/60.jpg)
Neural Networks on Silicon
• Currently:Simulation of continuous device physics (neural
networks)
Digital computational model (thresholding)
Continuous device physics (voltage)
Why not skip this?
![Page 61: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/61.jpg)
Example: Silicon Retina
Simulates function of biological retina
Single-transistor synapses adapt to luminance, temporal contrast
Modeling retina directly on chip => requires 100x less power!
![Page 62: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/62.jpg)
Example: Silicon Retina
• Synapses modeled with single transistors
![Page 63: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/63.jpg)
Luminance Adaptation
![Page 64: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/64.jpg)
Comparison with Mammal Data
• Real:
• Artificial:
![Page 65: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/65.jpg)
• Graphics and results taken from:
![Page 66: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/66.jpg)
General NN learning in silicon?
• Seems less in-vogue than in late-90s
• Interest has turned somewhat to implementing Bayesian techniques in analog silicon
![Page 67: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/67.jpg)
Advanced Topics in Neural Nets
• Batch Move vs. incremental• Hidden Layer Representations• Hopfield Nets• Neural Networks on Silicon• Neural Network language models
![Page 68: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/68.jpg)
Neural Network Language Models
• Statistical Language Modeling:– Predict probability of next word in sequence
I was headed to Madrid , ____
P(___ = “Spain”) = 0.5,P(___ = “but”) = 0.2, etc.
• Used in speech recognition, machine translation, (recently) information extraction
![Page 69: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/69.jpg)
• Estimate:
Formally
121 ,...,,| njjjj wwwwP
jj hwP |
![Page 70: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/70.jpg)
![Page 71: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/71.jpg)
Optimizations
• Key idea – learn simultaneously:– vector representations for each word (50
dim)– a predictor of the next word
• Short-lists– Much complexity in hidden->output layer
• Number of possible next words is large
– Only predict a subset of words• Use a standard probabilistic model for the rest
![Page 72: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/72.jpg)
Design Decisions (1)
• Number of hidden units
• Almost no difference…
![Page 73: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/73.jpg)
Design Decisions (2)
• Word representation (# of dimensions)
• They chose 120
![Page 74: Machine Learning Neural Networks Human Brain Neurons](https://reader038.vdocuments.mx/reader038/viewer/2022102907/56649d555503460f94a3239d/html5/thumbnails/74.jpg)
Comparison vs. state of the art