from neurons to neural networks jeff knisley east tennessee state university mathematics of...
Post on 12-Jan-2016
215 Views
Preview:
TRANSCRIPT
From Neurons to Neural Networks
Jeff KnisleyEast Tennessee State University
Mathematics of Molecular and Cellular Biology Seminar
Institute for Mathematics and its Applications, April 2, 2008
Outline of the Talk
Brief Description of the Neuron A “Hot-Spot” Dendritic Model
Classical Hodgkin-Huxley (HH) Model A Recent Approach to HH Nonlinearity
Artificial Neural Nets (ANN’s) 1957 – 1969: Perceptron Models 1980’s – soon: MLP’s and Others 1990’s – : Neuromimetic (Spiking) Neurons
Components of a Neuron
Dendrites
Soma
nucleus
Axon
Myelin Sheaths
Synaptic Terminals
Pre-Synaptic to Post-Synaptic
If threshold exceeded,then neuron “fires,” sending a signal along its axon.
Signal Propagation along Axon
Signal is electrical Membrane depolarization from resting -70 mV Myelin acts as an insulator
Propagation is electro-chemical Sodium channels open at breaks in myelin
Much higher external Sodium ion concentrations Potassium ions “work against” sodium Chloride, other influences also very important
Rapid depolarization at these breaks Signal travels faster than if only electrical
Signal Propagation along Axon
+++- - - +++- - -
+++- - - +++- - - +++
- - -
- - -+++
reversal
- - -+++
reversal- - -+++
reversal- - -+++
reversal
Action Potentials
Sodium ion channels open and close
Which causes Potassium ion channels to open and close
Action Potentials
Model “Spike”
Actual Spike Train
Post-Synaptic may be SubThreshold
Signals Decay at Soma if below a Certain threshold
Models beginwith section of a dendrite.
Derivation of the Model
Some Assumptions Assume Neuron separates R3 into 3 regions—interior
(i), exterior (e), and boundary membrane surface (m) Assume El is electric field and Bl is magnetic flux
density, where l = e, i Maxwell’s Equations:
Assume magnetic induction is negligible
Ee = – Ve and Ei = – Vi for potentials Vl , l = i,e
0ll t
B
E
Current Densities ji and je
Letl = conductivity 2-tensor, l = i, e Intracellular homogeneous; small radius Extracellular: Ion Populations!
Ohm’s Law (local): lll Ej
0 ij 02 iV
L0
ji
je
Charges (ions) collecton outside of boundarysurface (especially Na+)
where Im = membranecurrents. Thus,
me I j+ + + +
mee IV
Assume: Circular Cross-sections
Let V = Vi – Ve – Vrest be membrane potential difference, and let Rm, Ri , C be the membrane resistance, intracellular resistance, membrane capacitance, respectively. Let Isyn be a “catch all” for ion channel activity.
ionm
m It
VC
R
VI
d
4 syni m
d V V VC I
x R x x R x t
Lord Kelvin:
Cable EquationIion
Dimensionless Cables
2
2 m m syn
V VV R I
X t
Let and let and m= RmC constant4
m
i
R dx X
R
Tapered Cylinders: Z instead of X and a taper constant K.
2
2 m m syn
V V VK V R I
Z Z t
Iion
Iion
Rall’s Theorem for Untapered
If at each branching the parent
diameter and the daughter cylinder
diameters satisfy
then the dendritic tree can be reduced
to a single equivalent cylinder.
parentdaughters
3/ 2 3/ 2parent j
j daughters
d d
Equivalent Cylinder
Dendritic Models
Full Arbor ModelTapered Equivalent Cylinder
Soma
Tapered Equivalent Cylinder
Rall’s theorem (modified for taper) allows us to collapse to an equivalent cylinder
Assume “hot spots” at x0, x1, …, xm
Soma
0 x0 x1 . . . xm l
. . .
Ion Channel Hot Spots
(Poznanski) Ij due to ion channel(s) at the jth hot spot
Green’s function G(x, xj, t) is solution to hot spot equation for Ij as a point source and others = 0 Plus boundary conditions and Initial conditions Green is solution to Equivalent Cylinder model
2
214
nm
m j jji
R d V VR C V I t x x
R x t
Equivalent Cylinder Model (Iion = 0)
2
20
, 0 ( )
tanh 0,0, 0,
,0 . .
m
s
V VV
X tV
L t no current through endX
L V tVt V t
X t
V x Steady State from const curr
Soma: V ( 0, t ) = Vclamp (voltage clamp)
For Tapered Equivalent Cylinder Model, equation is of the form
02
2
Vt
V
Z
VZF
Z
Vm
Properties
Spectrum is solely non-negative eigenvalues Eigenvectors are orthogonal in Voltage Clamp Eigenvectors are not orthogonal in original
Solutions are multi-exponential decays
Linear Models useful for subthreshold activation assuming nonlinearities (Iion) are not arbitrarily close to soma (and no electric field (ephaptic) effects)
1
/,k
tk
keXCtXV
Somatic Voltage Recording
Saturate to Steady State
Experimental ArtifactMultiExponential Decay
Ionic Channel Effects
0 10ms
Hodgkin-Huxley: Ionic Currents
1963 Nobel Prize in Medicine Cable Equation plus Ionic Currents (Isyn) From Numerous Voltage Clamp Experiments
with squid giant axon (0.5-1.0 mm in diameter) Produces Action Potentials
Ionic Channels n = potassium activation variable m = sodium activation variable h = sodium inactivation variable
Hodgkin-Huxley Equations
24 3
24
1 , 1 ,
1
m l K NaK Nai
n n m m
h h
d V VC g V V g n V V g m h V V
R x t
n mn n m m
t th
h ht
where any V with subscript is constant, any g with a bar is constant, and each of the ’s and ’s are of similar form:
/80
10 /10
10 1,
8100 1
Vn nV
VV V e
e
∙ (x-x j)
HH combined with “Hot Spots”
The solution to the equiv cylinder with hotspots is
where Ij is the restriction of V to jth “hot spot”. At a hot-spot, V satisfies ODE of the form
where m, n, and h are functions of V.
0
0
, , ,n t
initial j jj
V x t V G x x t I d
4 3m l K NaK Na
VC g V V g n V V g m h V V
t
Brief description of an Approach to HH ion channel nonlinearities
Goal: Accessible Approximations that still produce action potentials.
Can be addressed using Linear Embedding, which is closely related to the method of Turning Variables. Maps an finite degree polynomially nonlinear dynamical
system into an infinite degree linear system. The result is an infinite dimensional linear system which is
as unmanageable as the original nonlinear equation. Non-normal operators with continua of eigenvalues Difficult to project back to nonlinear system (convergence
and stability are thorny) But still the approach has some value (action potentials).
The Hot-Spot Model “Qualitatively”
0
0
0, 0, ,n t
j jj
V t G x t I d
Inputs fromOther Neurons
and ion channels
Key Features: Summation of Synaptic Inputs. If V(0,t) is large, action potential travels down axon.
From Subthreshold (Rall Eq. Cyl or
Full Arbor)
Artificial Neural Network (ANN)
Made of artificial neurons, each of which Sums inputs xi from other neurons Compares sum to threshold Sends signal to other neurons if above
threshold Synapses have weights
Model relative ion collections Model efficacy (strength) of synapse
Artificial Neuron
th thi jw synaptic weight betweeni and j neuron
" "firing function that maps state to output
i i ix s i ij js w x
1x2x3x
nx
1iw2iw
3iw
inw
..
.
thj threshold of j neuron
Nonlinear firing function
First Generation: 1957 - 1969
Best Understood in terms of Classifiers Partition a data space into regions containing
data points of the same classification. The regions are predictions of the
classification of new data points.
Simple Perceptron Model
Given 2 classes – Reference and Sample
Firing function (activation function) has only two values, 0 or 1.
“Learning” is by incremental updating of weights using a linear learning rule
referencefromif
samplefromifOutput
0
1w1
w2
wn
Perceptron Limitations
Cannot Do XOR (1969, Minsky and Papert) Data must be linearly separable
1970’s: ANN’s “Wilderness Experience” – only a handful working and very “un-neuron-like”
Support Vector Machine: Perceptron on a Feature Space Data is projected into a high-dimensional
Feature Space, separated with a hyperplane Choice of Feature Space (kernel) is key. Predictions based on location of hyperplane
Second Generation: 1981 - Soon
Big Ideas from other Fields J. J. Hopfield compares neural networks to
Ising Spin Glass models. Uses statistical Mechanics to prove that ANN’s minimize a total energy functional.
Cognitive Psychology provides new insights into how neural networks learn.
Big Ideas from Math Kolmogorov’s Theorem
AND
Firing Functions are Sigmoidal
j
j
j
1
1 j j jj j s
se
3 Layer Neural Network
Output
Hidden(is usually much larger)
Input
The output layer mayconsist of a single neuron
Multilayer Network
1 1 1t w x
1x
2x3x
nx
..
....
tN N N w x
12
N1
N
j jj
out
1
Nt
j j jj
out
w x
Hilbert’s Thirteenth Problem
Original: “Are there continuous functions of 3 variables that are not representable by a superposition of composition of functions of 2 variables?”
Modern: Can a continuous function of n variables on a bounded domain of n-space be written as sums of compositions of functions of 1 variable?
Kolmogorov’s Theorem
Modified Version: Any continuous function f
of n variables can be written
where only h and w’s depend on f
(That is, the g’s are fixed)
2 1
11 1
, ,n n
n ij j ij i
f s s h g s
Cybenko (1989)
Let be any continuous sigmoidal function,
and let x = (x1,…,xn). If f is absolutely integrable
over the n-dimensional unit cube, then for all >0,
there exists a (possibly very large ) integer N and
vectors w1,…,wN such that
where 1,…,N and 1,…,N are fixed parameters.
1
NT
j j jj
f
x w x
Multilayer Network (MLP’s)
1 1 1t w x
1x
2x3x
nx
..
....
tN N N w x
12
N1
N
j jj
out
1
Nt
j j jj
out
w x
ANN as a Universal Classifier
Designs a function f : Data -> Classes Example: f ( Red ) = 1, f ( Blue) = 0 Support of f defines the regions
Data is used to train (i.e., design ) function fsupp(f)
Example – Predicting Trees that are or are not RNA-like
D d-t d-a d-L d-D Lamb-2 E-ratio Randics
0.333333 0.666667 0.666667 0.5 0.666667 0.2679 0.8 2.914214
0.333333 0.5 0.5 0.5 0.666667 0.3249 1 2.770056
0.5 0.5 0.5 0.5 0.5 0.382 1 2.80806
0.166667 0.333333 0.5 0.833333 0.833333 1 2 2.236068
0.333333 0.333333 0.333333 0.666667 0.666667 0.4384 1.2 2.642734
0.333333 0.333333 0.333333 0.666667 0.666667 0.4859 1.4 2.56066
RNALike
NotRNALike
Construct Graphical Invariants Train ANN using known RNA-trees Predict the others
2nd Generation: Phenomenal Success
Data Mining of Micro-array data Stock and commodities trading: ANN’s are an
important part of “computerized trading” Post office mail sorting
This tiny 3-Dimensional Artificial Neural Network, modeled after neural networks in the human brain, is helping machines better visualize their surroundings.
The Mars Rovers ANN decides between “rough” and “smooth”
“rough” and “smooth” are ambiguous
Learningvia many“examples”
And a neural network can lose up to 10% of its neurons without significant loss in performance!
ANN Limitations
Overfitting: e.g, if Training Set is “unbalanced”
Mislabeled data can lead to slow (or no) convergence or incorrect results.
Hard Margins: No “fuzzing” of the boundary
Overfitting may
ProduceIsolatedRegions
Problems on the Horizon
Limitations are becoming very limiting Trained networks often are poor learners (and
self-learners are hard to train) In real neural networks, more neurons imply
better networks (not so in ANNs ). Temporal data is problematic – ANN’s have no
concept or a poor concept of time “Hybridized ANN’s” becoming the rule
SVM’s probably the tool of choice at present SOFM’s, Fuzzy ANN’s, Connectionism
Third Generation: 1997 -
Back to Bio: Spiking Neural Networks (SNN) Asynchronous, action-potential driven ANN’s
have been around for some time. SNN’s show “promise” but results beyond
current ANN’s have been elusive Simulating actual HH equations (neuromimetic)
has to date not been enough Time is both a promise and a curse
A Possible Approach: Use current dendritic models to modify existing ANN’s.
ANN’s with Multiple Time Scales
SNN that reduces to ANN & preserves Kolmogorov Thm The solution to the equiv cylinder with hotspots is
where Ij is the restriction of V to jth “hot spot”.
Equivalent Artificial Neuron:
0
0
0, 0, ,n t
initial j jj
V t V G x t I d
ij
t
jji dxtts 0
Incorporating MultiExponentials
G (0,x,t) is often a multi-exponential decay. In terms of time constants k
wjk are synaptic “weights” k from electrotonic and morphometric data
Rate of taper, Length of dendrites Branching, capacitance, resistance
n
j
t
jut
jkk
duxeews kk
10
//
1
Approximation and Simplification
If xj(u) approx 1 or xj(u) approx 0, then
A Special Case (k is a constant)
t = 0 yields the standard Neural Net Model Standard Neural Net as initial Steady State Modify with time-dependent transient
txews j
n
j
tkjk
k
k
1
/
1
1
j
n
j
ktjj xepws
1
1
Artificial Neuron
" "firing function that maps state to output
i i ix s
1x2x3x
nx
..
.
thj threshold of j neuron
Nonlinear firing function
j
n
j
ktijiji xepws
1
1
wij, pij = synaptic weights
wi1, pi1
win, pin
Steady State and Transient
Sensitivity and Soft Margins t = 0 is a perceptron with weights wij
t = ∞ is a perceptron with weights wij + pij For all t in (0, ∞), a traditional ANN with weights
between wij and wij + pij Transient is a perturbation scheme Many predictions over time (soft margins)
Algorithm Partition training set into subsets Train at t=0 for initial subset Train at t > 0 values for other subsets
Training the Network
Define an energy function
vectors are the information to be “learned” Neural networks minimize energy The “information” in the network is
equivalent to the minima of the total squared energy function
n
iiiiE
1
2
2
1
Back Propagation
Minimize Energy Choose wj and j so that In practice, this is hard
Back Propagation with cont. sigmoidal Feed Forward, Calculate E, modify weights
Repeat until E is sufficiently close to 0
0, 0ij j
E E
w
n
jjjkjjjj
newj
newj
jjjjjjjnewj
newj
xww
yyy
1
1,
Back Propagation with Transient
Train Network Initially (choose wj and j) Each “synapse” given a transient weight pij
Algorithm Addressing Over-fitting/Sensitivity Weights must be given random initial values Weights pij also given random initial values Separate Training of wj and j and pij ameliorates
over-fitting during the training sequence
n
jjjkjjj
ktj
newjoutput
newjhidden
ktjj
newjoutput
newjoutput
expp
epp
1,,
,,
)1(
),1(
Observations/Results
Spiking does occur But only if network is properly “initiated” Spikes only resemble Action Potentials
This is one approach to SNN’s Not likely to be the final word Other real neuron features may be necessary
(e.g., tapering axons can limit frequency of action potentials: also—branching! )
This approach does show promise in handling temporal information
Any Questions?
Thank you!
top related