welcome to introduction to bioinformatics wednesday, 10 february genome sequencing/assembly genome...
TRANSCRIPT
![Page 1: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/1.jpg)
Welcome toIntroduction to Bioinformatics
Wednesday, 10 FebruaryGenome Sequencing/Assembly
• Genome sequencing/Assembly
Click anywhere to go on to the next slide
This demonstration is best viewed as a slide show,enabling you to simulate a session and make
changes in cursor position more obvious.To do this, click Slide Show on the top tool bar, then View show.
![Page 2: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/2.jpg)
![Page 3: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/3.jpg)
What to do for summer vacation?
![Page 4: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/4.jpg)
Deadline, SUNday Feb 28!
![Page 5: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/5.jpg)
Target, Monday Mar 1!
![Page 6: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/6.jpg)
Deadline, ???
![Page 7: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/7.jpg)
Deadline, FRIday Feb 26!
![Page 8: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/8.jpg)
Global Viral Genome Project
Deadline, whenever!
![Page 9: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/9.jpg)
Learn more about…
HHMI: http://www.vcu.edu/csbc/hhmi/
BBSI: http://www.vcu.edu/csbc/bbsi/
VCU-USF: http://www.research.vcu.edu/vpr/fellowship.htm
GVGP: http://biobike.csbc.vcu.edu (News)
![Page 10: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/10.jpg)
What is the sequence (5' to 3') represented by the gel?
Myers et al SQ2
G A T C
![Page 11: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/11.jpg)
What is the sequence (5' to 3') represented by the gel?
Myers et al SQ2
G A T C
![Page 12: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/12.jpg)
Dideoxy sequencing(= Sanger sequencing)
![Page 13: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/13.jpg)
Dideoxy sequencing
![Page 14: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/14.jpg)
Dideoxy sequencing
![Page 15: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/15.jpg)
Dideoxy sequencing
![Page 16: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/16.jpg)
Dideoxy sequencing
![Page 17: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/17.jpg)
Dideoxy sequencing
![Page 18: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/18.jpg)
Dideoxy sequencing
![Page 19: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/19.jpg)
Dideoxy sequencing
![Page 20: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/20.jpg)
Dideoxy sequencing
![Page 21: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/21.jpg)
Dideoxy sequencing
![Page 22: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/22.jpg)
Dideoxy sequencing
![Page 23: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/23.jpg)
What is the sequence (5' to 3') represented by the gel? G A T C
Myers et al SQ2
![Page 24: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/24.jpg)
What is the sequence (5' to 3') represented by the gel? G A T C
ddCddC
ddCddC
ddC
TCGTGTACATCGTAACACGGTTAAGT
Myers et al SQ2
![Page 25: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/25.jpg)
Sequencing processDrosophila genome(~100 million nt)
Sequence it
Technical limitation
Reads limited to 100’s of nt
![Page 26: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/26.jpg)
Sequencing processDrosophila genome(~100 million nt)
. . .
How many possible 500 nt fragments are there?
![Page 27: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/27.jpg)
Sequencing processDrosophila genome(~100 million nt)
. . .
SAMPLE
![Page 28: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/28.jpg)
Sequencing processDrosophila genome(~100 million nt)
SAMPLE
. . .
How many 500 nt samples needed 100 million nt?100 000 000 500
![Page 29: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/29.jpg)
Sequencing processDrosophila genome(~100 million nt)
SAMPLE
. . .
How many 500 nt samples needed 100 million nt?
Is this enough?
Oversampling … coverage?
1 000 000 5
![Page 30: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/30.jpg)
Paint the wall
Study Question 8 & 9"oversampling"? "coverage"?
Shotgun sequencing ?
How long will this take?
![Page 31: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/31.jpg)
Paint the wall
How long will this take?
Study Question 8 & 9"oversampling"? "coverage"?
Shotgun sequencing ?
![Page 32: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/32.jpg)
Paint the wall
How long will this take?
40 "
25 "
1 sq "
Study Question 8 & 9"oversampling"? "coverage"?
Shotgun sequencing ?
![Page 33: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/33.jpg)
Paint the wall
How long will this take?
40 "
25 "
1000paint balls?
Study Question 8 & 9"oversampling"? "coverage"?
Shotgun sequencing ?
![Page 34: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/34.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10
Oversampling
Co
mp
lete
nes
s
How much is painted with 1x oversampling?
Study Question 8 & 9"oversampling"? "coverage"?
Shotgun sequencing ?
What fraction won't be painted?
![Page 35: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/35.jpg)
P(TT) = 1/2 x 1/2 = 1/4
Probability that two coins come up both tails
Rule of multiplicationintersectionindependent
Gets T from first AND gets T from second
Intersection of possibilities(Rule of multiplication)
Second coin toss
H
T
H HH
HTFirst
cointoss
T TH
TT
![Page 36: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/36.jpg)
P(at least 1 T) = 1/4 + 1/4 + 1/4
Probability that either of two coins comes up tails
1/2 x 1/2 = 1/4?
Gets HT or TH or TT
Union of possibilities(Rule of addition)
Second coin toss
H
T
H HH
HTFirst
cointoss
T TH
TT
1/2 + 1/2 = 1?
![Page 37: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/37.jpg)
P(at least 1 T) = 1/4 + 1/4 + 1/4
Probability that either of two coins comes up tails
Gets HT or TH or TT
Union of possibilities(Rule of addition)
Second coin toss
H
T
H HH
HTFirst
cointoss
T TH
TT
Rule of additionunion
mutually exclusive
![Page 38: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/38.jpg)
P(at least 1 T) = 1 - 1/4
Probability that either of two coins does not comes up tails
Probability(2 T) = 1 – Probability(NOT 2 T)
Union of possibilities(Rule of complementation)
Second coin toss
H
T
H HH
HTFirst
cointoss
T TH
TT
Rule of complementationyin-yangAdds to 1
![Page 39: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/39.jpg)
Sequencing processDrosophila genome(~100 million nt)
. . .
Focus on one nucleotide…
What’s the probability that it’s covered by one read?
What’s the probability that it’s covered by two reads?
What’s the probability that it’s covered by 200,000 reads?
![Page 40: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/40.jpg)
Problem Set 3, Problem 2Statistics of mini-plasmid assembly
![Page 41: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/41.jpg)
Why read pairs? Scaffolds?
DNA
Myers et al SQ6
Contig 1 Contig 2
![Page 42: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/42.jpg)
G A T Cprimer
primer
x 1000's
plasmid
insert
~2000 nt mates
Myers et al SQ6Why read pairs? Scaffolds?
![Page 43: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/43.jpg)
. . .
~ 150,000 nt
Bacterial Artificial CHROMOSOME
mates
Myers et al SQ6Why read pairs? Scaffolds?
P1-derived Artificial CHROMOSOME
![Page 44: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/44.jpg)
Myers et al SQ6Why read pairs? Scaffolds?
![Page 45: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649cdc5503460f949a7867/html5/thumbnails/45.jpg)
SQ14. From figures given in the text and in Table 1, check the accuracy of each of the following statements: a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ." b. ". . .trillions of overlaps between reads are examined." c. ". . .to produce 654,000 of the 2-kbp mates and 497,000 of the 10-kbp mates."
Myers et al (2000)