finding mathematics in genes and diseases ming-ying leung department of mathematical sciences...

43
Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Upload: clinton-george

Post on 25-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Finding Mathematics in Genes and Diseases

Ming-Ying Leung

Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Page 2: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)
Page 3: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)
Page 4: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

“1, 2, 3, … and Beyond”

• A slideshow for HKU Open Day in 1980

• I did the narration and background music

• The experience has a great impact on my journey

Mathematics is beyond numbers…

We find it in buildings, banks, and supermarkets…

…in atoms, molecules, and genes …

Page 5: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)
Page 6: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Outline:

Cytomegalovirus(CMV) Particle

• DNA and RNA• Genome, genes, and diseases• Palindromes and replication

origins in viral genomes• Mathematics for prediction

of replication origins

Page 7: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

DNA and RNA

T G

CA

G

AC

TGU

AC

• DNA is deoxyribonucleic acid, made up of 4 nucleotide bases Adenine, Cytosine, Guanine, and Thymine.

• RNA is ribonucleic acid, made up of 4 nucleotide bases Adenine, Cytosine, Guanine, and Uracil.

• For uniformity of notation, all DNA and RNA data sequences deposited in GenBank are represented as sequences of A, C, G, and T.

• The bases A and T form a complementary pair, so are C and G.

Page 8: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Genes and Genome

Page 9: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Genes and Diseases

Page 10: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Virus and Eye DiseasesCMV Particle

CMV Retinitis • inflammation of the retina • triggered by CMV particles• may lead to blindnessGenome size

~ 230 kbp

Page 11: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Replication Origins and Palindromes

• High concentration of palindromes exists around replication origins of other herpesviruses

• Locating clusters of palindromes (above a minimal length) on CMV genome sequence might reveal likely locations of its replication origins.

Page 12: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Palindromes in Letter Sequences

“A nut for a jar of tuna”

“Step on no pets”

ANUTFORA AROFTUNAJ

remove spaces and capitalize

STEPON NOPETS

Even Palindrome:

Odd Palindrome:

Page 13: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

DNA Palindromes

Page 14: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Association of Palindrome Clusters with Replication Origins

Page 15: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Computational Prediction of Replication Origins

• Palindrome distribution in a random sequence model

• Criterion for identifying statistically significant palindrome clusters

• Evaluate prediction accuracy

• Try to improve…

Page 16: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

• A mathematical model can be used to generate a DNA sequence

• A DNA molecule is made up of 4 types of bases• It can be represented by a letter sequence with

alphabet size = 4

• Adenosine• Cytosine• Guanine• Thymine

Wheel of Bases (WOB)

Random Sequence Model

G

AC

T

Page 17: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

• Adenosine• Cytosine• Guanine• Thymine

Wheel of Bases (WOB)

Random Sequence ModelEach type of the bases has its chance (or probability) of being used, depending on the base composition of the DNA molecule.

G

AC

T

Page 18: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

• Adenosine• Cytosine• Guanine• Thymine

Wheel of Bases (WOB)

Random Sequence Model

G

AC

T

Each type of the bases has its chance (or probability) of being used, depending on the base composition of the DNA molecule.

Page 19: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Poisson Process Approximation of Palindrome Distribution

Page 20: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Use of the Scan Statistic to Identify Clusters of Palindromes

Page 21: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Measures of Prediction Accuracy

Attempts to improve prediction accuracy by:

• Adopting the best possible approximation to the scan statistic distribution

• Taking the lengths of palindromes into consideration when counting palindromes

• Using a better random sequence model

Page 22: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Markov Chain Sequence Models

• More realistic random sequence model for DNA and RNA

• It allows neighbor dependence of bases (i.e., the present base will affect the selection of bases for the next base)

• A Markov chain of nucleotide bases can be generated using four WOBs in a “Sequence Generator (SG)”

Page 23: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Sequence Generator (SG)

Wheels of Bases (WOB)Bases

G

AC

T

Page 24: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Sequence Generator (SG)

Wheels of Bases (WOB)Bases

G

AC

T

Page 25: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

T

Sequence Generator (SG)

Wheels of Bases (WOB)Bases

G

AC

T

Page 26: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

T

Sequence Generator (SG)

Wheels of Bases (WOB)Bases

G

AC

T

Page 27: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

CT

Sequence Generator (SG)

Wheels of Bases (WOB)Bases

G

AC

T

Page 28: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

CT

Sequence Generator (SG)

Wheels of Bases (WOB)Bases

G

AC

T

Page 29: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

TT

Sequence Generator (SG)

Wheels of Bases (WOB)Bases

G

AC

T

C

Page 30: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

TTT

Sequence Generator (SG)

Wheels of Bases (WOB)Bases

G

AC

T

C

Page 31: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

TTTT

Sequence Generator (SG)

Wheels of Bases (WOB)Bases

G

AC

T

C

Page 32: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

ATTTT

Sequence Generator (SG)

Wheels of Bases (WOB)Bases

G

AC

T

C

Page 33: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

ATTTT

Sequence Generator (SG)

Wheels of Bases (WOB)Bases

G

AC

T

C A

Page 34: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

AAC CG TT GTTTT

Sequence Generator (SG)

Wheels of Bases (WOB)Bases

G

AC

T

C A A

Page 35: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

TTTT

Sequence Generator (SG)

Wheels of Bases (WOB)Bases

G

AC

T

C A A C A A CG TT G

Page 36: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Results Obtained for Markov Sequence Models

• Probabilities of occurrences of single palindromes

• Probabilities of occurrences of overlapping palindromes

• Mean and variance of palindrome counts

Page 37: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Related Work in Progress

• Finding the palindrome distribution on Markov random sequences

• Investigating other sequence patterns such as close repeats and inversions in relation to replication origins

Page 38: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Other Mathematical Topics in Genes and Diseases

• Optimization Techniques – prediction of molecular structures

• Differential Equations – molecular dynamics

• Matrix Theory – analyzing gene expression data

• Fourier Analysis – proteomics data

Page 39: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Acknowledgements

Collaborators Louis H. Y. Chen (National University of Singapore) David Chew (National University of Singapore) Kwok Pui Choi (National University of Singapore)Aihua Xia (University of Melbourne, Australia)

Funding Support NIH Grants S06GM08194-23, S06GM08194-24, and 2G12RR008124 NSF DUE9981104 W.M. Keck Center of Computational & Struct. Biol. at Rice University National Univ. of Singapore ARF Research Grant (R-146-000-013-112) Singapore BMRC Grants 01/21/19/140 and 01/1/21/19/217

Page 40: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

St. Stephen’s Girls’ College

Page 41: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

University of Hong Kong

Department of Mathematics: A Beach Picnic

Page 42: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)
Page 43: Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

Continuing to Find Mathematics in Genes and Diseases

Ming-Ying Leung

Department of Mathematical Sciences University of Texas at El Paso (UTEP)