analysis of microarray data using monte carlo neural networks
DESCRIPTION
Analysis of Microarray Data using Monte Carlo Neural Networks. Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative Biology East Tennessee State University. Outline of Talk. Microarray Data Neural Networks A Simple Perceptron Example - PowerPoint PPT PresentationTRANSCRIPT
Analysis of Microarray Data using Monte Carlo Neural Networks
Jeff Knisley, Lloyd Lee Glenn,Karl Joplin, and Patricia Carey
The Institute for Quantitative BiologyEast Tennessee State University
Outline of Talk
Microarray Data Neural Networks A Simple Perceptron Example Neural Networks for Data Mining
A Monte Carlo Approach Incorporating Known Genes
Models of the Neuron and Neural Networks
Microarray Data
Goal: Identify genes which are up- or down- regulated when an organism is in a certain state
Examples: What genes cause certain insects to enter
diapause (similar to hibernation)? In Cystic Fibrosis, what non-CFTR genes are
up- or down-regulated?
cDNA Microarray’s
Obtain mRNA from a population/tissue in the given state (sample) and a population/tissue not in the given state (reference)
Synthesize cDNA’s from mRNA’s in cell cDNA is long (500 – 2,000 bases) But not necessarily the entire gene Reference labeled green, Sample labeled red
Hybridize onto “spots”—each spot is a gene Each “Spot” is often (but not necessarily) a gene cDNA’s bind to each spot in proportion to
concentrations
cDNA Microarray Data
Ri, Gi = intensities of ith spot Absolute intensities often cannot be compared Same reference may be used for all samples There are many sources of Bias Significant spot-to-spot intensity variations
may have nothing to do with the biology
Normalization to Ri,= Gi on average Most genes are unchanged, all else equal But rarely is “all else equal”
Microarray Data
Several Samples (and References) A time series of microarrays A comparison of several different samples Data is in the form of a table
Jth microarray intensities are Rj,i, Gj,I
We often have subtracted background intensity
Question: How can we use Rj,i, Gj,I for n samples to predict which genes are up- or down- regulated for a given condition?
MicroArray Data
We do not use Mj,I = log2( Rj,I / Gj,I ) Large | Mj,I | = obvious up or down regulation
In comparison to other | Mj,I | But must be large across all n microarrays
Otherwise, hard to make conclusions from Mj,I
It is often difficult to manage i jiMn
1
Microarray Data
-6
-4
-2
0
2
4
6
0 2 4 6 8 10 12 14 16
Mi =
Log
2(R
i/Gi)
Log2(Ri Gi)
Microarray Analysis
Is a classification problem Clustering: Classify genes into a few
identifiable groups Principal Component Analysis: Choose
directions (i.e., axes) (i.e., principal components) that reveal the greatest variation in the data and then find clusters
Neural Nets and Support Vector Machines Trained with Positive and Negative Examples Classifies unknown as positive or negative
Artificial Neural Network (ANN)
Made of artificial neurons, each of which Sums inputs from other neurons Compares sum to threshold Sends signal to other neurons if above
threshold Synapses have weights
Model relative ion collections Model efficacy (strength) of synapse
Artificial Neuron
th thi jw synaptic weight betweeni and j neuron
" "firing function that maps state to output
i i ix s i ij js w x
1x2x3x
nx
1iw2iw
3iw
inw
..
.
thj threshold of j neuron
Nonlinear firing function
Firing Functions are Sigmoidal
j
j
j
Possible Firing Functions
Discrete:
Continuous:
0
1j j
j jj j
if ss
if s
1
1 j j jj j s
se
3 Layer Neural Network
Output
Hidden(is usually much larger)
Input
The output layer mayconsist of a single neuron
ANN as Classifiers
Each neuron acts as a “linear classifier” Competition among neurons via nonlinear
firing function = “local linear classifying” Method for Genes:
Train Network until it can classify between references and samples
Eliminating weights sufficiently close to 0 does not change local classification scheme
Multilayer Network
1 1 1t w x
1x
2x3x
nx
..
....
tN N N w x
12
N1
N
j jj
out
1
Nt
j j jj
out
w x
How do we select w’s
Define an energy function
t vectors are the information to be “learned” Neural networks minimize energy The “information” in the network is
equivalent to the minima of the total squared energy function
2
1
1
2
n
i i ii
E t
Back Propagation
Minimize the Energy Function Choose wj and j so that
In practice, this is hard Back Propagation with cont. sigmoidal
Feed Forward and Calculate E Modify weights using a rule
Repeat until E is sufficiently close to 0
0, 0ij j
E E
w
jjj
jjnewj
newjjjj
newj
newj twwt
1
,
ANN as Classifier
1. Remove % of genes with synaptic weights that are close to 0
2. Create ANN classifier on reduced arrays
3. Repeat 1 and 2 until only the genes that most influence the classifer problem remain
Remaining genes are most important in classifying references versus samples
Simple Perceptron Model
Input
referencefromif
samplefromifOutput
0
1
Gene 1
Gene 2
Gene m
w1
w2
wm
The wi can be interpreted to be measures of how important the ith gene is to determining the output
Simple Perceptron Model
Features The wi can be used in place of the Mji
Detects genes across n samples & references Ref: Artificial Neural Networks for Reducing
the Dimensionality of Gene Expression Data, A. Narayanan, et al. 2004.
Drawbacks The Perceptron is a linear classifier (i.e., only
classifies linearly separable data) How to incorporate known genes
Linearly Separable Data
Separation using Hyperplanes
Data that Cannot be separated Linearly
Functional Viewpoint
ANN is a mapping f: Rn → R Can we train perceptron so that f(x1,…,xn) =1 if
x vector is from a sample and f(x1,…,xn) =0 if x is from a reference?
Answer: Yes if data can be linearly separated, but no otherwise
So then can we design such a mapping for a more general ANN?
Hilbert’s Thirteenth Problem
Original: “Are there continuous functions of 3 variables that are not representable by a superposition of composition of functions of 2 variables?”
Modern: Can any continuous function of n variables on a bounded domain of n-space be written as sums of compositions of functions of 1 variable?
Kolmogorov’s Theorem
Modified Version: Any continuous function f
of n variables can be written
where only h depends on f
2 1
11 1
,...,n n
n ij j ij i
f s s h g s
Cybenko (1989)
Let be any continuous sigmoidal function,
and let x = (x1,…,xn). If f is absolutely integrable
over the n-dimensional unit cube, then for all >0,
there exists a (possibly very large ) integer N and
vectors w1,…,wN such that
where 1,…,N and 1,…,N are fixed parameters.
1
NT
j j jj
f
x w x
Recall: Multilayer Network
1 1 1t w x
1x
2x3x
nx
..
....
tN N N w x
12
N1
N
j jj
out
1
Nt
j j jj
out
w x
ANN as Classifer
Answer: (Cybenko) for any >0, the function f(x1,…,xn) =1 if x vector is from a sample and f(x1,…,xn) =0 if x is from a reference can be approximated to within by a multilayer neural network.
But the weights no longer have the one-to-one correspondence to genes.
ANN and Monte Carlo Methods
Monte Carlo methods have been a big success story with ANN’s Error estimates with network predictions ANN’s are very fast in the forward direction
Example: ANN+MC implement and outperform Kalman Filters (recursive linear filters used in Navigation and elsewhere) (De Freitas J. F. G., et. al., 2000)
Recall: Multilayer Network
..
....
12
N
1
Nt
j j jj
out
w x
N Genes N nodeHidden Layer
j correspond to genes,but do not directly dependon a single gene.
Naïve Monte Carlo ANN Method
1. Randomly choose subset S of genes
2. Train using Back Propagation
3. Prune based on values of wj (or j , or both)
4. Repeat 2-3 until a small subset of S remains
5. Increase “count” of genes in small subset
6. Repeat 1-5 until each gene has 95% probability of appearing at least some minimum number of times in a subset
7. Most frequent genes are the predicted
Additional Considerations
If a gene is up-regulated or down-regulated for a certain condition, then put it into a subset in step 1 with probability 1.
This is a simple-minded Bayesian method. Bayesian analysis can make it much better.
Algorithm distributes naturally across a multi-processor cluster or machine Choose the subsets first Distribute subsets to different machines Tabulate the results from all the machines
What Next…
Cybenko is not the “final answer” Real neurons are much more complicated ANN abstract only a few features Only at the beginning of how to separate noise
and bias from the classification problem. Many are now looking at neurons themselves
for answers
Components of a Neuron
Dendrites
Soma
nucleus
Axon
Myelin Sheaths
Synaptic Terminals
Signals Propagate to Soma
Signals Decay at Soma if below a Certain threshold
Signals May Arrive Close Together
If threshold exceeded,then neuron “fires,” sending a signal along its axon.
Signal Propogation along Axon
Signal is electrical Membrane depolarization from resting -70 mV Myelin acts as an insulator
Propagation is electro-chemical Sodium channels open at breaks in myelin Rapid depolarization at these breaks Signal travels faster than if only electrical
Neurons send “spike trains” from one to another.
Hodgkin-Huxley Model
1963 Nobel Prize in Medicine Cable Equation plus Ionic Currents (Isyn) Can only be solved numerically Produce Action Potentials
Ionic Channels n = potassium activation variable m = sodium activation variable h = sodium inactivation variable
Hodgkin-Huxley Equations
24 3
24
1 , 1 ,
1
m l K NaK Nai
n n m m
h h
d V VC g V V g n V V g m h V V
R x t
n mn n m m
t th
h ht
where any V with subscript is constant, any g with a bar is constant, and each of the ’s and ’s are of similar form:
/80
10 /10
10 1,
8100 1
Vn nV
VV V e
e
Hodgkin-Huxley nearly intractable
So researchers began developing artificial models to better understand what neurons are all about
A New Approach
Poznanski (2001): Synaptic effects are isolated into hot spots
Soma
Synapse
Tapered Equivalent Cylinder
Rall’s theorem (modified for taper) allows us to collapse to an equivalent cylinder
Soma
Tapered Equivalent Cylinder
Assume “hot spots” at x0, x1, …, xm
Soma
0 x0 x1 . . . xm l
. . .
Ion Channel Hot Spots
Ij is the ionic current at the jth hot spot
Green’s function G(x, xj, t) is solution to hot spot equation for Ij as a point source and others = 0 (plus boundary conditions )
2
214
nm
m j jji
R d V VR C V I t x x
R x t
Convolution Theorem
The solution to the original is of the form
The voltage at the soma is
0
0
, , ,n t
initial j jj
V x t V G x x t I d
0
0
0, 0, ,n t
initial j jj
V t V G x t I d
Ion Channel Currents
At a hot-spot, “voltage” V satisfies ODE of the form
Assume that ’s and ’s are large degree polynomials Introduce a new family of functions
“Embed” original into system of ODE’s for
4 3m l K NaK Na
VC g V V g n V V g m h V V
t
rqprqp hmnU ,,
rqpU ,,
Linear Embedding: Simple Example
nnVAA
dt
dV ...0To Embed
jj VU Let . Then
110
1 njn
jjj VjAVjAdt
dVjV
dt
dU
Linear Embedding: Simple Example
110 njnjj UjAUjA
dt
dU
The result is
The result is an infinite dimensional linear system which is often as unmanageable as the original nonlinear equation.
However, linear embeddings do often produce good numericalapproximations.
Moreover, linear embedding implies that each Ij is given by a linear transformation of the vector of U’s
The Hot-Spot Model “Qualitatively”
0
0
0, 0, ,n t
j jj
V t G x t I d
Weighted sumsof functions
of one variable
convolutionsof
TheSum
of
Kolmogorov’s Theorem (given that convolutions are related to composition)
Any Questions?
References
Cybenko, G. Approximation by Superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems, 2(4),1989, p. 303-314.
De Freitas J. F. G., et. al. Sequential Monte Carlo Methods To Train Neural Network Models. Neural Computation, Volume 12, Number 4, 1 April 2000, pp. 955-993(39)
L. Glenn and J. Knisley, Solutions for Transients in Arbitrarily Branching and Tapering Cables, Modeling in the Neurosciences: From Biological Systems to Neuromimetic Robotics, ed. Lindsay, R., R. Poznanski, G.N.Reeke, J.R. Rosenberg, and O.Sporns, CRC Press, London, 2004.
A. Narayan, et. al Artificial Neural Networks for Reducing the Dimensionality of Gene Expression Data. Neurocomputing, 2004.