analysis of microarray data using monte carlo neural networks

52
Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative Biology East Tennessee State University

Upload: laszlo

Post on 02-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Analysis of Microarray Data using Monte Carlo Neural Networks. Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative Biology East Tennessee State University. Outline of Talk. Microarray Data Neural Networks A Simple Perceptron Example - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Analysis of Microarray Data using Monte Carlo Neural Networks

Analysis of Microarray Data using Monte Carlo Neural Networks

Jeff Knisley, Lloyd Lee Glenn,Karl Joplin, and Patricia Carey

The Institute for Quantitative BiologyEast Tennessee State University

Page 2: Analysis of Microarray Data using Monte Carlo Neural Networks

Outline of Talk

Microarray Data Neural Networks A Simple Perceptron Example Neural Networks for Data Mining

A Monte Carlo Approach Incorporating Known Genes

Models of the Neuron and Neural Networks

Page 3: Analysis of Microarray Data using Monte Carlo Neural Networks

Microarray Data

Goal: Identify genes which are up- or down- regulated when an organism is in a certain state

Examples: What genes cause certain insects to enter

diapause (similar to hibernation)? In Cystic Fibrosis, what non-CFTR genes are

up- or down-regulated?

Page 4: Analysis of Microarray Data using Monte Carlo Neural Networks

cDNA Microarray’s

Obtain mRNA from a population/tissue in the given state (sample) and a population/tissue not in the given state (reference)

Synthesize cDNA’s from mRNA’s in cell cDNA is long (500 – 2,000 bases) But not necessarily the entire gene Reference labeled green, Sample labeled red

Hybridize onto “spots”—each spot is a gene Each “Spot” is often (but not necessarily) a gene cDNA’s bind to each spot in proportion to

concentrations

Page 5: Analysis of Microarray Data using Monte Carlo Neural Networks

cDNA Microarray Data

Ri, Gi = intensities of ith spot Absolute intensities often cannot be compared Same reference may be used for all samples There are many sources of Bias Significant spot-to-spot intensity variations

may have nothing to do with the biology

Normalization to Ri,= Gi on average Most genes are unchanged, all else equal But rarely is “all else equal”

Page 6: Analysis of Microarray Data using Monte Carlo Neural Networks

Microarray Data

Several Samples (and References) A time series of microarrays A comparison of several different samples Data is in the form of a table

Jth microarray intensities are Rj,i, Gj,I

We often have subtracted background intensity

Question: How can we use Rj,i, Gj,I for n samples to predict which genes are up- or down- regulated for a given condition?

Page 7: Analysis of Microarray Data using Monte Carlo Neural Networks

MicroArray Data

We do not use Mj,I = log2( Rj,I / Gj,I ) Large | Mj,I | = obvious up or down regulation

In comparison to other | Mj,I | But must be large across all n microarrays

Otherwise, hard to make conclusions from Mj,I

It is often difficult to manage i jiMn

1

Page 8: Analysis of Microarray Data using Monte Carlo Neural Networks

Microarray Data

-6

-4

-2

0

2

4

6

0 2 4 6 8 10 12 14 16

Mi =

Log

2(R

i/Gi)

Log2(Ri Gi)

Page 9: Analysis of Microarray Data using Monte Carlo Neural Networks

Microarray Analysis

Is a classification problem Clustering: Classify genes into a few

identifiable groups Principal Component Analysis: Choose

directions (i.e., axes) (i.e., principal components) that reveal the greatest variation in the data and then find clusters

Neural Nets and Support Vector Machines Trained with Positive and Negative Examples Classifies unknown as positive or negative

Page 10: Analysis of Microarray Data using Monte Carlo Neural Networks

Artificial Neural Network (ANN)

Made of artificial neurons, each of which Sums inputs from other neurons Compares sum to threshold Sends signal to other neurons if above

threshold Synapses have weights

Model relative ion collections Model efficacy (strength) of synapse

Page 11: Analysis of Microarray Data using Monte Carlo Neural Networks

Artificial Neuron

th thi jw synaptic weight betweeni and j neuron

" "firing function that maps state to output

i i ix s i ij js w x

1x2x3x

nx

1iw2iw

3iw

inw

..

.

thj threshold of j neuron

Nonlinear firing function

Page 12: Analysis of Microarray Data using Monte Carlo Neural Networks

Firing Functions are Sigmoidal

j

j

j

Page 13: Analysis of Microarray Data using Monte Carlo Neural Networks

Possible Firing Functions

Discrete:

Continuous:

0

1j j

j jj j

if ss

if s

1

1 j j jj j s

se

Page 14: Analysis of Microarray Data using Monte Carlo Neural Networks

3 Layer Neural Network

Output

Hidden(is usually much larger)

Input

The output layer mayconsist of a single neuron

Page 15: Analysis of Microarray Data using Monte Carlo Neural Networks

ANN as Classifiers

Each neuron acts as a “linear classifier” Competition among neurons via nonlinear

firing function = “local linear classifying” Method for Genes:

Train Network until it can classify between references and samples

Eliminating weights sufficiently close to 0 does not change local classification scheme

Page 16: Analysis of Microarray Data using Monte Carlo Neural Networks

Multilayer Network

1 1 1t w x

1x

2x3x

nx

..

....

tN N N w x

12

N1

N

j jj

out

1

Nt

j j jj

out

w x

Page 17: Analysis of Microarray Data using Monte Carlo Neural Networks

How do we select w’s

Define an energy function

t vectors are the information to be “learned” Neural networks minimize energy The “information” in the network is

equivalent to the minima of the total squared energy function

2

1

1

2

n

i i ii

E t

Page 18: Analysis of Microarray Data using Monte Carlo Neural Networks

Back Propagation

Minimize the Energy Function Choose wj and j so that

In practice, this is hard Back Propagation with cont. sigmoidal

Feed Forward and Calculate E Modify weights using a rule

Repeat until E is sufficiently close to 0

0, 0ij j

E E

w

jjj

jjnewj

newjjjj

newj

newj twwt

1

,

Page 19: Analysis of Microarray Data using Monte Carlo Neural Networks

ANN as Classifier

1. Remove % of genes with synaptic weights that are close to 0

2. Create ANN classifier on reduced arrays

3. Repeat 1 and 2 until only the genes that most influence the classifer problem remain

Remaining genes are most important in classifying references versus samples

Page 20: Analysis of Microarray Data using Monte Carlo Neural Networks

Simple Perceptron Model

Input

referencefromif

samplefromifOutput

0

1

Gene 1

Gene 2

Gene m

w1

w2

wm

The wi can be interpreted to be measures of how important the ith gene is to determining the output

Page 21: Analysis of Microarray Data using Monte Carlo Neural Networks

Simple Perceptron Model

Features The wi can be used in place of the Mji

Detects genes across n samples & references Ref: Artificial Neural Networks for Reducing

the Dimensionality of Gene Expression Data, A. Narayanan, et al. 2004.

Drawbacks The Perceptron is a linear classifier (i.e., only

classifies linearly separable data) How to incorporate known genes

Page 22: Analysis of Microarray Data using Monte Carlo Neural Networks

Linearly Separable Data

Separation using Hyperplanes

Page 23: Analysis of Microarray Data using Monte Carlo Neural Networks

Data that Cannot be separated Linearly

Page 24: Analysis of Microarray Data using Monte Carlo Neural Networks

Functional Viewpoint

ANN is a mapping f: Rn → R Can we train perceptron so that f(x1,…,xn) =1 if

x vector is from a sample and f(x1,…,xn) =0 if x is from a reference?

Answer: Yes if data can be linearly separated, but no otherwise

So then can we design such a mapping for a more general ANN?

Page 25: Analysis of Microarray Data using Monte Carlo Neural Networks

Hilbert’s Thirteenth Problem

Original: “Are there continuous functions of 3 variables that are not representable by a superposition of composition of functions of 2 variables?”

Modern: Can any continuous function of n variables on a bounded domain of n-space be written as sums of compositions of functions of 1 variable?

Page 26: Analysis of Microarray Data using Monte Carlo Neural Networks

Kolmogorov’s Theorem

Modified Version: Any continuous function f

of n variables can be written

where only h depends on f

2 1

11 1

,...,n n

n ij j ij i

f s s h g s

Page 27: Analysis of Microarray Data using Monte Carlo Neural Networks

Cybenko (1989)

Let be any continuous sigmoidal function,

and let x = (x1,…,xn). If f is absolutely integrable

over the n-dimensional unit cube, then for all >0,

there exists a (possibly very large ) integer N and

vectors w1,…,wN such that

where 1,…,N and 1,…,N are fixed parameters.

1

NT

j j jj

f

x w x

Page 28: Analysis of Microarray Data using Monte Carlo Neural Networks

Recall: Multilayer Network

1 1 1t w x

1x

2x3x

nx

..

....

tN N N w x

12

N1

N

j jj

out

1

Nt

j j jj

out

w x

Page 29: Analysis of Microarray Data using Monte Carlo Neural Networks

ANN as Classifer

Answer: (Cybenko) for any >0, the function f(x1,…,xn) =1 if x vector is from a sample and f(x1,…,xn) =0 if x is from a reference can be approximated to within by a multilayer neural network.

But the weights no longer have the one-to-one correspondence to genes.

Page 30: Analysis of Microarray Data using Monte Carlo Neural Networks

ANN and Monte Carlo Methods

Monte Carlo methods have been a big success story with ANN’s Error estimates with network predictions ANN’s are very fast in the forward direction

Example: ANN+MC implement and outperform Kalman Filters (recursive linear filters used in Navigation and elsewhere) (De Freitas J. F. G., et. al., 2000)

Page 31: Analysis of Microarray Data using Monte Carlo Neural Networks

Recall: Multilayer Network

..

....

12

N

1

Nt

j j jj

out

w x

N Genes N nodeHidden Layer

j correspond to genes,but do not directly dependon a single gene.

Page 32: Analysis of Microarray Data using Monte Carlo Neural Networks

Naïve Monte Carlo ANN Method

1. Randomly choose subset S of genes

2. Train using Back Propagation

3. Prune based on values of wj (or j , or both)

4. Repeat 2-3 until a small subset of S remains

5. Increase “count” of genes in small subset

6. Repeat 1-5 until each gene has 95% probability of appearing at least some minimum number of times in a subset

7. Most frequent genes are the predicted

Page 33: Analysis of Microarray Data using Monte Carlo Neural Networks

Additional Considerations

If a gene is up-regulated or down-regulated for a certain condition, then put it into a subset in step 1 with probability 1.

This is a simple-minded Bayesian method. Bayesian analysis can make it much better.

Algorithm distributes naturally across a multi-processor cluster or machine Choose the subsets first Distribute subsets to different machines Tabulate the results from all the machines

Page 34: Analysis of Microarray Data using Monte Carlo Neural Networks

What Next…

Cybenko is not the “final answer” Real neurons are much more complicated ANN abstract only a few features Only at the beginning of how to separate noise

and bias from the classification problem. Many are now looking at neurons themselves

for answers

Page 35: Analysis of Microarray Data using Monte Carlo Neural Networks

Components of a Neuron

Dendrites

Soma

nucleus

Axon

Myelin Sheaths

Synaptic Terminals

Page 36: Analysis of Microarray Data using Monte Carlo Neural Networks

Signals Propagate to Soma

Signals Decay at Soma if below a Certain threshold

Page 37: Analysis of Microarray Data using Monte Carlo Neural Networks

Signals May Arrive Close Together

If threshold exceeded,then neuron “fires,” sending a signal along its axon.

Page 38: Analysis of Microarray Data using Monte Carlo Neural Networks

Signal Propogation along Axon

Signal is electrical Membrane depolarization from resting -70 mV Myelin acts as an insulator

Propagation is electro-chemical Sodium channels open at breaks in myelin Rapid depolarization at these breaks Signal travels faster than if only electrical

Neurons send “spike trains” from one to another.

Page 39: Analysis of Microarray Data using Monte Carlo Neural Networks

Hodgkin-Huxley Model

1963 Nobel Prize in Medicine Cable Equation plus Ionic Currents (Isyn) Can only be solved numerically Produce Action Potentials

Ionic Channels n = potassium activation variable m = sodium activation variable h = sodium inactivation variable

Page 40: Analysis of Microarray Data using Monte Carlo Neural Networks

Hodgkin-Huxley Equations

24 3

24

1 , 1 ,

1

m l K NaK Nai

n n m m

h h

d V VC g V V g n V V g m h V V

R x t

n mn n m m

t th

h ht

where any V with subscript is constant, any g with a bar is constant, and each of the ’s and ’s are of similar form:

/80

10 /10

10 1,

8100 1

Vn nV

VV V e

e

Page 41: Analysis of Microarray Data using Monte Carlo Neural Networks

Hodgkin-Huxley nearly intractable

So researchers began developing artificial models to better understand what neurons are all about

Page 42: Analysis of Microarray Data using Monte Carlo Neural Networks

A New Approach

Poznanski (2001): Synaptic effects are isolated into hot spots

Soma

Synapse

Page 43: Analysis of Microarray Data using Monte Carlo Neural Networks

Tapered Equivalent Cylinder

Rall’s theorem (modified for taper) allows us to collapse to an equivalent cylinder

Soma

Page 44: Analysis of Microarray Data using Monte Carlo Neural Networks

Tapered Equivalent Cylinder

Assume “hot spots” at x0, x1, …, xm

Soma

0 x0 x1 . . . xm l

. . .

Page 45: Analysis of Microarray Data using Monte Carlo Neural Networks

Ion Channel Hot Spots

Ij is the ionic current at the jth hot spot

Green’s function G(x, xj, t) is solution to hot spot equation for Ij as a point source and others = 0 (plus boundary conditions )

2

214

nm

m j jji

R d V VR C V I t x x

R x t

Page 46: Analysis of Microarray Data using Monte Carlo Neural Networks

Convolution Theorem

The solution to the original is of the form

The voltage at the soma is

0

0

, , ,n t

initial j jj

V x t V G x x t I d

0

0

0, 0, ,n t

initial j jj

V t V G x t I d

Page 47: Analysis of Microarray Data using Monte Carlo Neural Networks

Ion Channel Currents

At a hot-spot, “voltage” V satisfies ODE of the form

Assume that ’s and ’s are large degree polynomials Introduce a new family of functions

“Embed” original into system of ODE’s for

4 3m l K NaK Na

VC g V V g n V V g m h V V

t

rqprqp hmnU ,,

rqpU ,,

Page 48: Analysis of Microarray Data using Monte Carlo Neural Networks

Linear Embedding: Simple Example

nnVAA

dt

dV ...0To Embed

jj VU Let . Then

110

1 njn

jjj VjAVjAdt

dVjV

dt

dU

Page 49: Analysis of Microarray Data using Monte Carlo Neural Networks

Linear Embedding: Simple Example

110 njnjj UjAUjA

dt

dU

The result is

The result is an infinite dimensional linear system which is often as unmanageable as the original nonlinear equation.

However, linear embeddings do often produce good numericalapproximations.

Moreover, linear embedding implies that each Ij is given by a linear transformation of the vector of U’s

Page 50: Analysis of Microarray Data using Monte Carlo Neural Networks

The Hot-Spot Model “Qualitatively”

0

0

0, 0, ,n t

j jj

V t G x t I d

Weighted sumsof functions

of one variable

convolutionsof

TheSum

of

Kolmogorov’s Theorem (given that convolutions are related to composition)

Page 51: Analysis of Microarray Data using Monte Carlo Neural Networks

Any Questions?

Page 52: Analysis of Microarray Data using Monte Carlo Neural Networks

References

Cybenko, G. Approximation by Superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems, 2(4),1989, p. 303-314.

De Freitas J. F. G., et. al. Sequential Monte Carlo Methods To Train Neural Network Models. Neural Computation, Volume 12, Number 4, 1 April 2000, pp. 955-993(39)

L. Glenn and J. Knisley, Solutions for Transients in Arbitrarily Branching and Tapering Cables, Modeling in the Neurosciences: From Biological Systems to Neuromimetic Robotics, ed. Lindsay, R., R. Poznanski, G.N.Reeke, J.R. Rosenberg, and O.Sporns, CRC Press, London, 2004.

A. Narayan, et. al Artificial Neural Networks for Reducing the Dimensionality of Gene Expression Data. Neurocomputing, 2004.