sudhindra r. gadagkar, ph.d
TRANSCRIPT
![Page 1: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/1.jpg)
Sudhindra R. Gadagkar, Ph.D.
Computational Biology
University of Dayton
![Page 2: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/2.jpg)
Some background material…
• BS in Fisheries Science from University of Agricultural Sciences, Bangalore, India
• MS in Fisheries Science (Statistics)
![Page 3: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/3.jpg)
Tilapia (Oreochromis niloticus)
![Page 4: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/4.jpg)
Genetics of fish behavior
![Page 5: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/5.jpg)
Ph. D. research (contd.)
• Complex behaviors are heritable (behaviors governed by genes)
• Behavior and growth rate are correlated at the genetic level
(the same gene(s) are responsible for both traits or they are closely linked)
![Page 6: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/6.jpg)
Post-doctoral research in Bioinformatics at Arizona State University
![Page 7: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/7.jpg)
What I do now
• There is information in DNA and this information is used by the body.
Source for image: www.nigms.nih.gov/.../ genetics/science.html
![Page 8: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/8.jpg)
• DNA is an incredibly long strand, made up of four different molecules (called nucleotides), abbreviated as A, C, G and T.
• For example, the DNA from the longest human chromosome is 12 cm long!
![Page 9: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/9.jpg)
• Each cell of the human body contains DNA.
• The total length of all this DNA is >3 billion nucleotides!
• That’s a large number!
![Page 10: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/10.jpg)
Let’s get some perspective• A DNA sequence can look like the following:• ACTGTTTGAAATTGACCCAGCACTTCTCCCTCGCGCAGACAGAGAGCAGTGTAG
ACGGAGCCTTAATCGCTAGAGCGAATCCCGATGCCCCACCTTCCGTCGGTGCATAAGTCGCACGGCGTCTCCCCCCCGTATGTGGTCTTAGGTAACCGCCGCCGGGCGTAGGGTTCACGGTCGAGGATGAAGATGGCGATTCGTCACCTCGCCAACGGGAGGGACCTCATTCGATCGATCCGCAAGTCTTCGCGGGAGCTCGTCATGCGGAACGCAGGAGACAACACTCTGCGTCGGATGCGCGCCGTATCAGTCGGGTGAGGCACGCCTAGCGATTCGACCTTAATTCCCGGACGCGACGCGAGGAGTTGGGAGATTGCTGCCCAAACCGGTCCGCGCTACTTAGGCTGCCGGACCCTTCTCGCCCCACGGGTGGCGGTGGTAATAGAGTTGGCCCGCCCTCTATGTGTCGGAAAGGGGGAGCCGGGGGCCGTGAGGATGCCCACACTGTCGGCGAGACCATGCTATCGAGCCTCCCTGGGACCCTCGGGGACTTTAGTTCCCACTCGGTTGGGGATTCAGTAGCCACGAATCAGACCGCCCCGGGTGGGGGCTTCGTCGTCTTGTCTTTCCAGCCCCCCTCTACTCTTCCTACTACGCCCGTCTGTCGAGGGTGCCGAGCGCGCAGTGTGCTCCCAGCGGCTCGTGCCAGGTTAGGTAGCCATATGTATTTATCGGCTGAGGACCGCCCGCCGTGTACCGACGATTTTGTTATAATTCTAGAGATGGGCTGGCACTTACCTGCTAGGTTTCTTGTCTGCTATGACTCGTGCGAACAGTCTTACTCTTGGCACAGCCGCGATGGCGATGGTTTAGCGGTTCCCATGGGGGGAATCGCGCGACGGCACCCAGTTCTGTTTCGACCGGACCCTGCTTACTCCTGGCCGAGAGGCCTCATTCTCGTTCGAGTCGATCGCTTATGTTATCGCGCCATTGGGAGTGCTCTGACCAATTACCGACCCGGAGTGTG
![Page 11: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/11.jpg)
Let’s get some perspective
• What if we try and write down the entire sequence (all 3.5 billion of them)?
• After all, now we do know what the entire human DNA sequence is.
![Page 12: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/12.jpg)
Let’s get some perspective
• Let’s see…if we can fit 75 letters in each line and if there are 50 lines in a page, then a page will contain 3,750 nucleotides.
• That does sound like a lot (the earlier slide had 1024).
![Page 13: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/13.jpg)
Let’s get some perspective• A book that contains 100 pages can hold
100 x 3,750 = 375,000 nucleotides.
• That is a lot!
• How thick do you think a book of 100 pages might be?
• An inch maybe.
• We need to write down at least 3 billion letters.
![Page 14: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/14.jpg)
Let’s get some perspective
• Therefore, we need (3,000,000,000)/375,000
• = 8000 inches
• = 667 feet.
![Page 15: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/15.jpg)
The Washington monument
Source of image: epod.usra.edu/archive/ epodviewer.php3?oid=158368
![Page 16: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/16.jpg)
Let’s get some perspective
• ... is 555 feet!
• So imagine a stack of books taller than the Washington monument crammed with letters – no spaces, no commas, no paragraphs.
![Page 17: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/17.jpg)
Let’s get some perspective
• And we would have written down the data for one strand in one cell of one human being!
• We need to understand this data.
• Remember, there are no words, no punctuations, no “parts of speech” in this “text”.
• Yet, we have to make sense out of this information.
![Page 18: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/18.jpg)
Another example
• This is the evolutionary tree of primates.
• There are 10 species here whose evolutionary relationship we are interested in.
Source for image: locus.umdnj.edu/nigms/ special/primate.html
![Page 19: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/19.jpg)
How many possible trees?
• Do you know how many possible ways there are for drawing the evolutionary history (“tree”) for 10 species?
2
2 3 !Formula:
2 2 !n
n
n
where n is the number of species
![Page 20: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/20.jpg)
How many trees!
0
400
600
800
1000
1200
0 100 200 300 400
Millions
Billions
10200
10
10
10
10
10N
o. o
f P
oss i
ble
Tre
es
No. of Sequences
1079 atoms in the universe
1037 atoms in the bodies of all humans by year 2035
5 1030 prokaryotes living today
5 1011 stars in the milky way
How many trees represent the true relationship?
![Page 21: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/21.jpg)
• And only one of them is the correct tree because evolution has happened only once.
• And we need to find it!
![Page 22: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/22.jpg)
One final example
![Page 23: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/23.jpg)
Pairwise Alignment – contd.• Consider these two DNA sequences
– AATCTATA– AAGATA
• We want to compare them site by site, so we need to align them by introducing gaps.
• Gaps can be introduced in various places, and in various combinations, as shown next.
![Page 24: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/24.jpg)
Pairwise Alignment – contd.
AATCTATA
AAG--ATA
AATCTATA
AA-G-ATA
AATCTATA
AA--GATA
![Page 25: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/25.jpg)
Pairwise Alignment – contd.• Clearly, if the sequences are long, it would
become impossible for manual introduction of gaps; we would need a computer to help us find the optimal gaps.
• But let us first see what is involved in asking the computer to do this.
• One way, the looooooooong way is to:– introduce gaps in every possible position.
![Page 26: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/26.jpg)
The Brute Force Method(the Perspiration approach)
• For the long way, to get an idea of what is involved, let us first look at the first position.
• There are three possible choices:1. gap in the first sequence2. gap in the second sequence, or3. gap in neither sequenceThat is,• - A A• A - A
![Page 27: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/27.jpg)
The Brute Force Method(the Perspiration approach)
• These options are the same for every position.
• Therefore, the number of possible paths, y, for a pair of sequences of length 1 base is 3
• If the sequences are 2 bases long it is 32 = 9.
![Page 28: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/28.jpg)
The Brute Force Method(the Perspiration approach)
• In general, if they are n bases long, then there are 3n paths.
• If n = 20, then y = 320 = 3.4 x 109
![Page 29: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/29.jpg)
The Brute Force Method(the Perspiration approach)
• If n = 200, then y = 3200 = 2.6 x 1095
• If one path takes 1 nanosecond (10-9 seconds), then for a pair of sequences that is 200 bases long, the computer will need
– 8.4 x 1078 years!!
![Page 30: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/30.jpg)
Let’s get some perspective
• Needs a super-human effort, eh?
• That’s absolutely right!
• That super-human is the computer.
• But it’s not enough to just use the computer to solve such problems.
• The computer does not have to work hard. It needs to work smart!
![Page 31: Sudhindra R. Gadagkar, Ph.D](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55bdb0fcbb61eb17588b45b4/html5/thumbnails/31.jpg)
Need Computer!