analysis of microarray data using monte carlo neural networks

Analysis of Microarray Data using Monte Carlo Neural Networks

Jeff Knisley, Lloyd Lee Glenn,Karl Joplin, and Patricia Carey

The Institute for Quantitative BiologyEast Tennessee State University

Outline of Talk

Microarray Data Neural Networks A Simple Perceptron Example Neural Networks for Data Mining

A Monte Carlo Approach Incorporating Known Genes

Models of the Neuron and Neural Networks

Microarray Data

Goal: Identify genes which are up- or down- regulated when an organism is in a certain state

Examples: What genes cause certain insects to enter

diapause (similar to hibernation)? In Cystic Fibrosis, what non-CFTR genes are

up- or down-regulated?

cDNA Microarray’s

Obtain mRNA from a population/tissue in the given state (sample) and a population/tissue not in the given state (reference)

Synthesize cDNA’s from mRNA’s in cell cDNA is long (500 – 2,000 bases) But not necessarily the entire gene Reference labeled green, Sample labeled red

Hybridize onto “spots”—each spot is a gene Each “Spot” is often (but not necessarily) a gene cDNA’s bind to each spot in proportion to

concentrations

cDNA Microarray Data

Ri, Gi = intensities of ith spot Absolute intensities often cannot be compared Same reference may be used for all samples There are many sources of Bias Significant spot-to-spot intensity variations

may have nothing to do with the biology

Normalization to Ri,= Gi on average Most genes are unchanged, all else equal But rarely is “all else equal”

Microarray Data

Several Samples (and References) A time series of microarrays A comparison of several different samples Data is in the form of a table

Jth microarray intensities are Rj,i, Gj,I

We often have subtracted background intensity

Question: How can we use Rj,i, Gj,I for n samples to predict which genes are up- or down- regulated for a given condition?

MicroArray Data

We do not use Mj,I = log2( Rj,I / Gj,I ) Large | Mj,I | = obvious up or down regulation

In comparison to other | Mj,I | But must be large across all n microarrays

Otherwise, hard to make conclusions from Mj,I

It is often difficult to manage i jiMn

1

Microarray Data

-6

-4

-2

0

2

4

6

0 2 4 6 8 10 12 14 16

Mi =

Log

2(R

i/Gi)

Log2(Ri Gi)

Microarray Analysis

Is a classification problem Clustering: Classify genes into a few

identifiable groups Principal Component Analysis: Choose

directions (i.e., axes) (i.e., principal components) that reveal the greatest variation in the data and then find clusters

Neural Nets and Support Vector Machines Trained with Positive and Negative Examples Classifies unknown as positive or negative

Artificial Neural Network (ANN)

Made of artificial neurons, each of which Sums inputs from other neurons Compares sum to threshold Sends signal to other neurons if above

threshold Synapses have weights

Model relative ion collections Model efficacy (strength) of synapse

Artificial Neuron

th thi jw synaptic weight betweeni and j neuron

" "firing function that maps state to output

i i ix s i ij js w x

1x2x3x

nx

1iw2iw

3iw

inw

..

.

thj threshold of j neuron

Nonlinear firing function

Firing Functions are Sigmoidal

j

j

j

Possible Firing Functions

Discrete:

Continuous:

0

1j j

j jj j

if ss

if s

1

1 j j jj j s

se

3 Layer Neural Network

Output

Hidden(is usually much larger)

Input

The output layer mayconsist of a single neuron

ANN as Classifiers

Each neuron acts as a “linear classifier” Competition among neurons via nonlinear

firing function = “local linear classifying” Method for Genes:

Train Network until it can classify between references and samples

Eliminating weights sufficiently close to 0 does not change local classification scheme

Multilayer Network

1 1 1t w x

1x

2x3x

nx

..

....

tN N N w x

12

N1

N

j jj

out

1

Nt

j j jj

out

w x

How do we select w’s

Define an energy function

t vectors are the information to be “learned” Neural networks minimize energy The “information” in the network is

equivalent to the minima of the total squared energy function

2

1

1

2

n

i i ii

E t

Back Propagation

Minimize the Energy Function Choose wj and j so that

In practice, this is hard Back Propagation with cont. sigmoidal

Feed Forward and Calculate E Modify weights using a rule

Repeat until E is sufficiently close to 0

0, 0ij j

E E

w

jjj

jjnewj

newjjjj

newj

newj twwt

1

,

ANN as Classifier

1. Remove % of genes with synaptic weights that are close to 0

2. Create ANN classifier on reduced arrays

3. Repeat 1 and 2 until only the genes that most influence the classifer problem remain

Remaining genes are most important in classifying references versus samples

Simple Perceptron Model

Input

referencefromif

samplefromifOutput

0

1

Gene 1

Gene 2

Gene m

w1

w2

wm

The wi can be interpreted to be measures of how important the ith gene is to determining the output

Simple Perceptron Model

Features The wi can be used in place of the Mji

Detects genes across n samples & references Ref: Artificial Neural Networks for Reducing

the Dimensionality of Gene Expression Data, A. Narayanan, et al. 2004.

Drawbacks The Perceptron is a linear classifier (i.e., only

classifies linearly separable data) How to incorporate known genes

Linearly Separable Data

Separation using Hyperplanes

Data that Cannot be separated Linearly

Functional Viewpoint

ANN is a mapping f: Rn → R Can we train perceptron so that f(x1,…,xn) =1 if

x vector is from a sample and f(x1,…,xn) =0 if x is from a reference?

Answer: Yes if data can be linearly separated, but no otherwise

So then can we design such a mapping for a more general ANN?

Hilbert’s Thirteenth Problem

Original: “Are there continuous functions of 3 variables that are not representable by a superposition of composition of functions of 2 variables?”

Modern: Can any continuous function of n variables on a bounded domain of n-space be written as sums of compositions of functions of 1 variable?

Kolmogorov’s Theorem

Modified Version: Any continuous function f

of n variables can be written

where only h depends on f

2 1

11 1

,...,n n

n ij j ij i

f s s h g s

Cybenko (1989)

Let be any continuous sigmoidal function,

and let x = (x1,…,xn). If f is absolutely integrable

over the n-dimensional unit cube, then for all >0,

there exists a (possibly very large ) integer N and

vectors w1,…,wN such that

where 1,…,N and 1,…,N are fixed parameters.

1

NT

j j jj

f

x w x

Recall: Multilayer Network

1 1 1t w x

1x

2x3x

nx

..

....

tN N N w x

12

N1

N

j jj

out

1

Nt

j j jj

out

w x

ANN as Classifer

Answer: (Cybenko) for any >0, the function f(x1,…,xn) =1 if x vector is from a sample and f(x1,…,xn) =0 if x is from a reference can be approximated to within by a multilayer neural network.

But the weights no longer have the one-to-one correspondence to genes.

ANN and Monte Carlo Methods

Monte Carlo methods have been a big success story with ANN’s Error estimates with network predictions ANN’s are very fast in the forward direction

Example: ANN+MC implement and outperform Kalman Filters (recursive linear filters used in Navigation and elsewhere) (De Freitas J. F. G., et. al., 2000)

Recall: Multilayer Network

..

....

12

N

1

Nt

j j jj

out

w x

N Genes N nodeHidden Layer

j correspond to genes,but do not directly dependon a single gene.

Naïve Monte Carlo ANN Method

1. Randomly choose subset S of genes

2. Train using Back Propagation

3. Prune based on values of wj (or j , or both)

4. Repeat 2-3 until a small subset of S remains

5. Increase “count” of genes in small subset

6. Repeat 1-5 until each gene has 95% probability of appearing at least some minimum number of times in a subset

7. Most frequent genes are the predicted

Additional Considerations

If a gene is up-regulated or down-regulated for a certain condition, then put it into a subset in step 1 with probability 1.

This is a simple-minded Bayesian method. Bayesian analysis can make it much better.

Algorithm distributes naturally across a multi-processor cluster or machine Choose the subsets first Distribute subsets to different machines Tabulate the results from all the machines

What Next…

Cybenko is not the “final answer” Real neurons are much more complicated ANN abstract only a few features Only at the beginning of how to separate noise

and bias from the classification problem. Many are now looking at neurons themselves

for answers

Components of a Neuron

Dendrites

Soma

nucleus

Axon

Myelin Sheaths

Synaptic Terminals

Signals Propagate to Soma

Signals Decay at Soma if below a Certain threshold

Signals May Arrive Close Together

If threshold exceeded,then neuron “fires,” sending a signal along its axon.

Signal Propogation along Axon

Signal is electrical Membrane depolarization from resting -70 mV Myelin acts as an insulator

Propagation is electro-chemical Sodium channels open at breaks in myelin Rapid depolarization at these breaks Signal travels faster than if only electrical

Neurons send “spike trains” from one to another.

Hodgkin-Huxley Model

1963 Nobel Prize in Medicine Cable Equation plus Ionic Currents (Isyn) Can only be solved numerically Produce Action Potentials

Ionic Channels n = potassium activation variable m = sodium activation variable h = sodium inactivation variable

Hodgkin-Huxley Equations

24 3

24

1 , 1 ,

1

m l K NaK Nai

n n m m

h h

d V VC g V V g n V V g m h V V

R x t

n mn n m m

t th

h ht

where any V with subscript is constant, any g with a bar is constant, and each of the ’s and ’s are of similar form:

/80

10 /10

10 1,

8100 1

Vn nV

VV V e

e

Hodgkin-Huxley nearly intractable

So researchers began developing artificial models to better understand what neurons are all about

A New Approach

Poznanski (2001): Synaptic effects are isolated into hot spots

Soma

Synapse

Tapered Equivalent Cylinder

Rall’s theorem (modified for taper) allows us to collapse to an equivalent cylinder

Soma

Tapered Equivalent Cylinder

Assume “hot spots” at x0, x1, …, xm

Soma

0 x0 x1 . . . xm l

. . .

Ion Channel Hot Spots

Ij is the ionic current at the jth hot spot

Green’s function G(x, xj, t) is solution to hot spot equation for Ij as a point source and others = 0 (plus boundary conditions )

2

214

nm

m j jji

R d V VR C V I t x x

R x t

Convolution Theorem

The solution to the original is of the form

The voltage at the soma is

0

0

, , ,n t

initial j jj

V x t V G x x t I d

0

0

0, 0, ,n t

initial j jj

V t V G x t I d

Ion Channel Currents

At a hot-spot, “voltage” V satisfies ODE of the form

Assume that ’s and ’s are large degree polynomials Introduce a new family of functions

“Embed” original into system of ODE’s for

4 3m l K NaK Na

VC g V V g n V V g m h V V

t

rqprqp hmnU ,,

rqpU ,,

Linear Embedding: Simple Example

nnVAA

dt

dV ...0To Embed

jj VU Let . Then

110

1 njn

jjj VjAVjAdt

dVjV

dt

dU

Linear Embedding: Simple Example

110 njnjj UjAUjA

dt

dU

The result is

The result is an infinite dimensional linear system which is often as unmanageable as the original nonlinear equation.

However, linear embeddings do often produce good numericalapproximations.

Moreover, linear embedding implies that each Ij is given by a linear transformation of the vector of U’s

The Hot-Spot Model “Qualitatively”

0

0

0, 0, ,n t

j jj

V t G x t I d

Weighted sumsof functions

of one variable

convolutionsof

TheSum

of

Kolmogorov’s Theorem (given that convolutions are related to composition)

Any Questions?

References

Cybenko, G. Approximation by Superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems, 2(4),1989, p. 303-314.

De Freitas J. F. G., et. al. Sequential Monte Carlo Methods To Train Neural Network Models. Neural Computation, Volume 12, Number 4, 1 April 2000, pp. 955-993(39)

L. Glenn and J. Knisley, Solutions for Transients in Arbitrarily Branching and Tapering Cables, Modeling in the Neurosciences: From Biological Systems to Neuromimetic Robotics, ed. Lindsay, R., R. Poznanski, G.N.Reeke, J.R. Rosenberg, and O.Sporns, CRC Press, London, 2004.

A. Narayan, et. al Artificial Neural Networks for Reducing the Dimensionality of Gene Expression Data. Neurocomputing, 2004.

analysis of microarray data using monte carlo neural networks

Documents

averagemost genes

analysis of microarray

spotseach spot

geneeach spot

noncftr genes

large mj

given state sample

microarray datawe