algorithmic analysis of human dna replication timing from discrete micro array data

33
Algorithmic Analy sis of Human DNA Replication Timing from Discrete Microarray Data Prof. Rushen Chahal

Upload: dr-singh

Post on 06-Apr-2018

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 1/33

Algorithmic Analysis of Human

DNA Replication Timing from

Discrete Microarray Data

Prof. Rushen Chahal

Page 2: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 2/33

Thesis Statement

The DNA replication timing profile can be

reconstructed efficiently and accurately 

from discrete time points.

(Glossary)

Prof. Rushen Chahal

Page 3: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 3/33

Presentation Outline

Biology background

Microarray technology

Experimental data

 ± Challenges

Algorithms

Research Plans

 ± Replication timing

 ± Origins

 ± Scale upProf. Rushen Chahal

Page 4: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 4/33

Natural Science

 ± DNA is the blueprint for organisms

It must be passed on (organism, cell)

Engineering

 ± Gene therapy

Insertion, deletion, modification

 ± Cancer is unchecked replication

Why Study DNA Replication?

Prof. Rushen Chahal

Page 5: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 5/33

... A G G T C G A C A C ...

... T C C A G C T G T G ...

Human genome > 3 billion bp

Replication rate ~ 1000 bp/min

Serial replication 5.7 years

6 to 10 hours (speedup > 5000)

Prof. Rushen Chahal

Page 6: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 6/33

Background

Prokaryotes

 ± E. Coli

DnaA binds to oriC

Eukaryotes ± ORC ± S. Cerevisiae (yeast)

ARS 11 bp consensus

 ± Mapping of origins

 ± Human

No known consensus

Few origins characterized

Prof. Rushen Chahal

Page 7: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 7/33

ATGGACTACGGATCAGTAAATCGATTAGGCACCAGATCAAGTACGATCCAGAGTACATAGCATACCATGACTAGA

TACCTGATGCCTAGTCATTTAGCTAATCCGTGGTCTAGTTCATGCTAGGTCTCATGTATCGTATGGTACTGATCT

GAGTACATAGCATACCATGACTAGA

CTCATGTATCGTATGGTACTGATCT

Interrogation at genomic scale

 ± Large increase in data Microarray data analysis

Array of probes tiles genome

PM probe

Cross-hybridization ± Repeats not tiled

Gaps in genome

Genome Tiling Microarrays

GAGTACATAGCATACCATGACTAGA MM probe  A 

Prof. Rushen Chahal

Page 8: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 8/33

Image analysis computes intensity of each array probe

Prof. Rushen Chahal

Page 9: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 9/33

The Cell Cycle

Start of S-phase

(0 hour)

S-PhaseProf. Rushen Chahal

Page 10: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 10/33

Profiling DNA Replication Timing

Ideal: f(chr, bp) = rtime

Isolate DNA replicated in

discrete parts of S-phase

 ± One cell is not enough ± Synchronize S-phase entry

Apply drugs

Release together 

 ± Synchronization error 

 ± Label in two hour intervals

Allelic Variation

 ± mf(chr, bp) = {rtime1, rtime2, «}

Prof. Rushen Chahal

Page 11: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 11/33

 Allelic Variation

Fluorescent in-situ Hybridization

(FISH)

 ± Replication timing at a given site

0hr

2hr

4hr

6hr

8hr

10hr

0hr

2hr

4hr

6hr

8hr

10hr

Temporally specific replication (TS)

Temporally non-specific replication (TNS)

11Prof. Rushen Chahal

Page 12: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 12/33

What is the Problem?

Reconstruct a continuous replication profile

 ± Temporally (time points)

 ± Spatially (probes)

from noisy data ± Biological experiments

 ± Synchronization error 

 ± Microarray artifacts

efficiently ± Genomic data (> 3 billion bp)

Prof. Rushen Chahal

Page 13: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 13/33

Initial Analysis

Tiling Analysis Software (T AS)

 ± Wilcoxon Rank Sum test in sliding window

Assess enrichment of treatment over control

 ± Window slides to get p-value for each probe

O(kn) time complexity

 ± n = # probes on array

 ± k = # probes in a window

» k scales linearly with window size

Prof. Rushen Chahal

Page 14: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 14/33

New Analysis

Thesis Statement (revisited):

The DNA replication timing profile can be

reconstructed efficiently and accurately from

discrete time points. Incorporate information from all time points

 ± Continuous view of replication timing (TR50)

Address temporally non-specific replication Scale up to the whole genome efficiently

Prof. Rushen Chahal

Page 15: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 15/33

0 0 1/1 0 0

0 2 4 6 8 10

1/6 1/6 1/3 0 1/3

0 2 4 6 8 10

5

5

 Allelic Variation Examples

TR50

TR50

Temporally specific replication

Temporally non-specific replication

Challenge: From distribution of array signal, determine replication category.

Prof. Rushen Chahal

Page 16: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 16/33

Temporal Specificity Algorithm

 // Is there evidence that all alleles are replicating together?

If ( max sum of two adjacent time points 5  /6 * total sum)

then {probe is temporally specific}

 // Is at least one allele replicating apart from the majority?

Else If ( max sum of two adjacent time points not including 

the maximum time point 1 /3 * total sum)

then {probe is temporally non-specific}

 // Isolated signal is not strong enough to be an allele.

Else

{probe is temporally specific}

Prof. Rushen Chahal

Page 17: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 17/33

Plotting TR50

 8 

 6 

T R 5  0  (  h 

 o ur  s  )  

33 33.5 34

Chromosomal Position (in millions of bp)

Smoothed TR50 curve recovers replication pattern

Local minima Possible locations of replication origin

Prof. Rushen Chahal

Page 18: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 18/33

Segregation Algorithm

Sliding window passes over probes to generate intervals ± Ratio of TSP to TNSP determines temporal specificity

 ± Average TR50 determines timing category

Mid Late

Ratio 2-to-1 &

TNS Early

Ratio < 2-to-1

Ratio < 2-to-1

Avg > 3.93.4 Avg 3.9

Avg < 3.4

Avg < 3.4 Avg 3.9

Avg 3.4 Avg > 3.9

Ratio < 2-to-1

Prof. Rushen Chahal

Page 19: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 19/33

Research Plan: Profile Generation

 No Signal Probes Segregation

Algorithm

(Sliding Window)

Probe Classification

(Temporal Specificity

Algorithm) &

TR50 Calculation

0-2hr 

2-4hr 

4-6hr 

6-8hr 

8-10hr 

TNS Probes

TS Probes & TR50

Low Probe Density

TNS Regions

TS Regions

Join Intervals

Joined TNS Regions

Joined TS Regions

M

ask TS probeswith JTS RegionsTS Probes that fall into JTS Regions

TR50 Smoothing

Smoothed TR50Segregate JTS Regions into

1/3¶s based on STR50

Early

Mid

Late

Join

Intervals

Joined Early

JoinedMid

Joined Late

Parameters to evaluate:

 ± Segregation Algorithm: sliding window size, minimum probe density

 ± Join Intervals: minimum interval size

Prof. Rushen Chahal

Page 20: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 20/33

Evaluation

Concordance of biological phenomena

 ± Segregation intervals FISH

 ± STR50 local minima Other origin methods

 ± Correlation with other biological data Gene density Early replication

AT content Late replication

Gene expression Early replication

Activating acetylation/methylation Early replication

Performance on random data

 ± Large quantity of TNS replication

Prof. Rushen Chahal

Page 21: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 21/33

Research Plan: Replication Origins

Drive DNA replication pattern

Smoothed TR50 local minima

 ± Cleaned up with new profiles

Other biological assays ± Early labeling fragments

 ± Nascent strands

 ± Bubble trapping

 ± ORC binding

Prof. Rushen Chahal

Page 22: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 22/33

 Approach and Evaluation

Correlation between methods

 ± Consensus sets

Motif analysis

 ± Positional attributes

Replication timing

Proximity to genes

Evaluation is difficult (few validated origins)

 ± Agreement between methods

 ± Testing proposed correlations

 ± Paper in preparation

Prof. Rushen Chahal

Page 23: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 23/33

Scaling Up to Whole Genome

Pilot 1% 100% of human genome

 ± Algorithms developed with scalability in mind

Incremental update sliding windows Linear time

Performance based evaluation ± If 100% data available

Profile multiple runs

 ± Else

Profile many 1% runs

Prof. Rushen Chahal

Page 24: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 24/33

Implementation Details

Java ± Class representation of proprietary microarray files

 ± Algorithms to process raw microarray data

 ± Diagnostic tools

Perl ± Scripts to process intermediate and final data

 ± Correlations, data transformation, quality assurance

R statistical language ± Smoothing, statistical plots, correlation studies

Shell scripts ± Automated processing of microarray sets

Prof. Rushen Chahal

Page 25: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 25/33

Current/Expected Contributions

Algorithms, Software Infrastructure, Analysis

Probe-by-probe TR50 analysis ± Temporal Specificity Algorithm

Combinatorial analysis of allele locations

Segregation Algorithm ± TNS, Early, Mid, Late replicating areas

Used to design validation experiments

Smoothed TR50 profile

 ± Local minima provide candidate origin set

Linear algorithms enable scale up

Randomness testing

Prof. Rushen Chahal

Page 26: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 26/33

PublicationsC ompleted:

ENCODE Project Consortium. The ENCODE(ENCyclopedia Of DNA Elements) Project. Sci enc e.2004 Oct 22; 306(5696):636-40.

ENCODE Project Consortium. Identification andanalysis of functional elements in 1% of the humangenome by the ENCODE pilot project. Nature.

{In Press, to appear in June 14, 2007 issue} Karnani N., Taylor C., Malhotra A., Dutta A. Pan-Sreplication patterns and chromosomal domainsdefined by genome tiling arrays of encode genomicareas. Genome Resear c h.{In Press, to appear in June 2007 issue}

UCSC Browser Tracks:TR50, Smoothed TR50, Local Minima, Segregation

In Progress:

Multi-million dollar NIH grant for scale up to fullhuman genome

Paper detailing origin methods, correlations, etc.

Prof. Rushen Chahal

Page 27: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 27/33

Why is this work computer science?

Fred Brooks: The Computer Scientist as Toolsmith II

 ± ³H itching our research to someone else¶s driving problems, and 

solving those problems on the owners¶ terms, leads us to richer 

computer science research.´ 

Not an incremental improvement

 ± Algorithmic techniques and analysis used to solve a problem

previously addressed inadequately with a statistical approach that

performed poorly

Collaboration outside of engineering disciplines enhancesvisibility, funding opportunities, and demand for CS work

Developed algorithms, time complexity analysis,

combinatorial analysis, feedback to experimental design

Prof. Rushen Chahal

Page 28: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 28/33

Will this work lead to any CS

publications?

The Nature article focused on analysis of the

biological data and includes descriptions of 

some of my algorithms

The Genome Research paper and origins paper will also contain writeups of my algorithms and

analysis techniques

The Pacific Symposium on Biocomputing

focuses on algorithms and computational

techniques

Prof. Rushen Chahal

Page 29: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 29/33

Isn't your approach too simple?

The approach isn¶t simple:

 ± Combinatorial analysis

 ± Temporal specificity algorithm (many iterations)

 ± Probewise computation to deal with binding affinity

 ± Incremental updating sliding windows

Cross-hybridiztion

Synchronization error 

 ± Smoothing Parameterization

 ± Linear algorithms for scale up

Prof. Rushen Chahal

Page 30: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 30/33

Can't your algorithm be replaced by

a well-known statistical method?

HMM¶s were used for segregation of intervals

 ± Performed poorly in comparison to my algorithm

Less accurate categorization of replication intervals

Prone to rapid oscillation, producing tiny intervals Parameterization was difficult

Lowess smoothing is a statistical method

 ± Parameterization was not easy

Prof. Rushen Chahal

Page 31: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 31/33

What are the biggest challenges in

this work?

Noise!

 ± The data to analyze comes from biological experiments with

several sources of noise that compound upon one another 

Biology

 ± I haven¶t had a course in biology since 10th grade

Microarrays

 ± New, evolving technology we¶re still learning to deal with

Data size

 ± Hundreds of GB of data to process

 ± Replicates, failed experiments

 ± Algorithms must be efficient

Prof. Rushen Chahal

Page 32: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 32/33

What kind of career are you aiming

for after graduation, and why?

Teaching Computer Science (Small College)

 ± I enjoyed learning in my undergraduate curriculum with

meaningful interactions with professors

 ± I taught Discrete Math at UVa in Fall ¶02 and Spring ¶03 Enjoyable, but 60-70 students too large

Post-doctoral (Biological Computing)

 ± Many opportunities around the world

 ± Further exploration of the field

Prof. Rushen Chahal

Page 33: Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data

http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 33/33

How will you know when your 

work/thesis is done?

Research is never really done, but you have to

declare victory at some point

The replication profiling algorithms I¶ve developed

already perform quite well ± I have concrete plans to improve and finalize them

Prof. Rushen Chahal