mat 4830 mathematical modeling 4.1 background on dna

40
MAT 4830 Mathematical Modeling 4.1 Background on DNA http://myhome.spu.edu/lauw

Upload: leona-lindsey

Post on 16-Dec-2015

232 views

Category:

Documents


2 download

TRANSCRIPT

MAT 4830Mathematical Modeling

4.1

Background on DNA

http://myhome.spu.edu/lauw

HW

Pick up your last HW in my office.

PNW MAA Meeting

The 7th annual Northwest Undergraduate Mathematics Symposium (NUMS) will be held in conjunction with the 2015 Spring meeting of the PNW MAA Section at the University of Washington Tacoma on April 10-11.

http://www.tacoma.uw.edu/maa-nums

Remarks

No handouts Need to read the textbook for more info All individual HW for this chapter (4.1-

4.6) Techniques learned can be apply to

other applications

Disclaimer

This is not a biology class! I do not know too much biology. We will ignore all possible theological

questions and implications.

Our Learning Philosophy

Acquire minimum background to start the analysis/ modeling.

Ignore the complexity of the biochemical process.

Our Learning Philosophy

Concentrate on certain mathematical problems.

Very interesting problems once we get through the terminologies.

DNA

Genetic info is encoded by DNA molecules, which are passed from parent to offspring.

Bases

4 types of smaller molecules:

Adenine (A), Guanine (G)

Cytosine (C), Thymine (T)

Bases

4 types of smaller molecules:

Adenine (A), Guanine (G) Purine

Cytosine (C), Thymine (T) Pyrimidine

Bases

A always pairs with T

G always pairs with C

Bases

A always pairs with T

G always pairs with C

Sequence: AGCGCT

Complementary TCGCGA

Sequence:

Bases

In order to describe a DNA, it suffices to list the bases in one strand.

Mutations

Mutations of DNA occur (randomly) from parent to offspring.

Mutations

Mutations of DNA occur (randomly) from parent to offspring.

Base Substitution

A common form of mutation. A base is replaced by another base.

A A T C G C

G A T G G C

Base Substitution

Transition: Pur by Pur, Pyr By Pyr Transversion: Pur By Pyr, Pyr By Pur

A A T C G C

G A T G G C

TransitionTransversion

Basic Question

How to deduce the amount of mutations during the descent of the DNA sequences?

Example

S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Example

S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Observed mutations: 2

Example

S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Actual mutations: 5

Example

S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Actual mutations: 5, (some are hidden mutations)

What Do We Want?

Compare the initial and final DNA sequences

Develop mathematical models to reconstruct the number of mutations likely to have occurred.

Reality…

Seldom do we actually have an ancestral DNA sequence, much less several from different times along a line of descent.

Instead, we have sequences from several currently living descendants, but no direct information about any of their ancestors.

Reality…

When we compare two sequences, and imagine the mutation process that produced them, the sequence of their most recent common ancestor, from which they both evolved, is unknown.

Orthologous Sequences

Given a DNA sequence from some organism, there are good search algorithms to find similar sequences for other organisms in DNA databases.

If a gene has been identified for one organism, we can quickly locate likely candidate sequences for similar genes in related organisms.

Orthologous Sequences

If the genes has similar function, we can reasonably assume the sequences are descended from a common ancestral sequence (orthologous)

Assumption

All sequences in our discussions are aligned orthologous DNA sequences

4.2 An Introduction to Probability

Read Section 4.2 to “review”.

4.3 Conditional Probability

Read Section 4.3 to “review”

Definition

Given two events and , the conditional probability of given is defined by

( )( | )

( )

P F EP F E

P E

Example

Suppose a 40-base ancestral and descendent DNA sequences are

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

Example

Count the frequency of base substitutions.

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T

Example

We can estimate

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T

1 0( | )P S i S j

Example

We can estimate

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T

1 0( | )P S i S j

1 0

1 0

1 0

1 0

7( | )

91

( | )9

( | ) 0

1( | )

9

P S A S A

P S G S A

P S C S A

P S T S A

Example

Q1: What is the sum of the 16 numbers in the table? Why?

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T

Example

Q2: What is the meaning of a row sum in the table?

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T

Example

We can form a table of conditional probabilities

1 0\

7 1 10

9 11 91 9 2

0 9 11 112 7 2

011 11 9

1 1 60

9 11 9

S S A G C T

A

G

C

T

1 0( | )P S i S j

Example

Q3: What is the sum of the entries in any column of this new table? Why?

1 0\

7 1 10

9 11 91 9 2

0 9 11 112 7 2

011 11 9

1 1 60

9 11 9

S S A G C T

A

G

C

T

Example

Q4: If instead of dividing by column sums, you divided by row sums, would you get the same results? What conditional probabilities would you be calculating?

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T