dna bytes summary creative technology nights - 16 february ... · dna bytes summary – creative...
Post on 14-Jun-2020
2 Views
Preview:
TRANSCRIPT
DNA Bytes summary – Creative Technology Nights - 16 February 2015
Why do some people have blonde hair and some -- dark hair? Why some of us have blue
eyes and some -- brown or green? It is all in our DNA and genes that it contains! In the
"DNA Bytes" session, we are going to talk about what DNA is, where to find it and, most
importantly, DNA's sequence.
One of the problems with getting DNA's sequence is that it only comes in short pieces
instead of one long sequence. Think of it as a book where all the words get moved around
and now you need to reconstruct the book's text to read the story. Similarly, scientists
have to stitch DNA pieces together like a big puzzle before they can study our genes.
We will discuss different strategies for stitching DNA sequence -- a process called
genome assembly -- and how computer scientists helped figure out ways to do it quickly
and unambiguously.
DNA Bytes!
By Darya F. & Shalyn G. for TechNights Feb 16, 2015
No actual biting will take place.
Who has blue eyes?
Who has blue eyes?• Why do we have different color eyes?
Who has blue eyes?• Why do we have different color eyes?
• Why do we have different hair?
Who has blue eyes?• Why do we have different color eyes?
• Why do we have different hair?
• It’s in the genes in our DNA!
What is DNA?
What is DNA?• A very long molecule that
carries our genetic information
What is DNA?• A very long molecule that
carries our genetic information
• Where is it?
What is DNA?• A very long molecule that
carries our genetic information
• Where is it?
What is DNA?• A very long molecule that
carries our genetic information
• Where is it? cell nucleus
What is DNA?• A very long molecule that
carries our genetic information
• Where is it?
• How long is it?
cell nucleus
What is DNA?• A very long molecule that
carries our genetic information
• Where is it?
• How long is it? ~2m (6.5 ft)
cell nucleus
What is DNA?• A very long molecule that
carries our genetic information
• Where is it?
• How long is it?
• How does it look?
~2m (6.5 ft)
cell nucleus
(zoom in)
(coiled tightly in the nucleus)
What is DNA?• A very long molecule that
carries our genetic information
• Where is it?
• How long is it?
• How does it look?
~2m (6.5 ft)
cell nucleus
DNA structure
DNA structureWhat is this structure called?
DNA structureDouble helix
What is this structure called?
DNA structureDouble helix
What is this structure called?What is connecting helices?
DNA structureDouble helixBases What is this structure called?
What is connecting helices?
DNA structureDouble helixBases What is this structure called?
What are the different bases?
What is connecting helices?
DNA structureDouble helixBases
ACGT
What is this structure called?
What are the different bases?
What is connecting helices?
DNA structureDouble helixBases
denineytosineuaninehiamine
ACGT
What is this structure called?
What are the different bases?
What is connecting helices?
DNA structureDouble helixBases
denineytosineuaninehiamine
ACGT
What is this structure called?
What are the different bases?
What is connecting helices?
Any patterns?
DNA structureDouble helixBases
denineytosineuaninehiamine
ACGT
C G
T A
What is this structure called?
What are the different bases?
What is connecting helices?
Any patterns?
DNA structureDouble helixBases
denineytosineuaninehiamine
ACGT
GTA
CACC
CAGT
C G
T A
What is this structure called?
What are the different bases?
What is connecting helices?
Any patterns?
DNA structureDouble helixBases
denineytosineuaninehiamine
ACGT
GTA
CACC
CAGT
DNA sequence
C G
T A
What is this structure called?
What are the different bases?
What is connecting helices?
Any patterns?
How can we get DNA sequence?
How can we get DNA sequence?Many copies of the same
DNA
Shear each DNA strand, randomly breaking it into many small pieces:
original
+lots of cool biology & chemistry & physics…
slide courtesy of C. Kingsford
How can we get DNA sequence?Many copies of the same
DNA
Shear each DNA strand, randomly breaking it into many small pieces:
original
+lots of cool biology & chemistry & physics…
slide courtesy of C. Kingsford
How can we get DNA bases?
ACGTTTC
ACGGATC
TGGTTTC
ACGTCGCACGTATC
CGGTACT
ACGTTTC
ACGGATC
TGGTTTC
ACGTCGCACGTATC
CGGTACT
ACGTTTC
ACGGATC
TGGTTTC
ACGTTGC
ACGTATC
CGGTACT
This process is called genome sequencing
sequencer
random order
DNA pieces
more cool biology, chemistry,
and physics!!!
READS
We can only read ~ 100 characters at a time from a random place:
Activity 1 — Alice in Wonderland
miss me very much to-night
hope they’ll remember her saucer of milk
down. There was nothing with me! There are no mice to-night, I should think! I hope they’ll
a bat, and that’s very like
There are no mice in the air, I’m afraiddown here with me! I’m afraid, but you might catch a bat,
Alice soon began
down, down.
slide courtesy of C. Kingsford
What did you get?
What did you get?Down, down, down. There was nothing else to do, so Alice soon began talking again. `Dinah'll miss me very much to-night, I should think! I hope they'll remember her saucer of milk at tea-time. Dinah my dear! I wish you were down here with me! There are no mice in the air, I'm afraid, but you might catch a bat, and that's very like a mouse, you know. But do cats eat bats, I wonder?
What did you get?Down, down, down. There was nothing else to do, so Alice soon began talking again. `Dinah'll miss me very much to-night, I should think! I hope they'll remember her saucer of milk at tea-time. Dinah my dear! I wish you were down here with me! There are no mice in the air, I'm afraid, but you might catch a bat, and that's very like a mouse, you know. But do cats eat bats, I wonder?
Now try it with DNA bases :)
What did you get?Down, down, down. There was nothing else to do, so Alice soon began talking again. `Dinah'll miss me very much to-night, I should think! I hope they'll remember her saucer of milk at tea-time. Dinah my dear! I wish you were down here with me! There are no mice in the air, I'm afraid, but you might catch a bat, and that's very like a mouse, you know. But do cats eat bats, I wonder?
Now try it with DNA bases :)
Activity 2
What did you get?
What did you get?
13 5 11 2 10 1 6 9 8 4 3 7 12 14 15
What did you get?
13 5 11 2 10 1 6 9 8 4 3 7 12 14 15
easy/hard?:) :(
What did you get?
13 5 11 2 10 1 6 9 8 4 3 7 12 14 15
This process is called genome assembly
easy/hard?:) :(
What is the mechanism?
What is the mechanism?Shortest Common Superstring
acggta gtactacctacttag
What is the mechanism?Shortest Common Superstring
acggtagtactac ctacttag
What is the mechanism?Shortest Common Superstring
acggtagtactac
ctacttag
What is the mechanism?Shortest Common Superstring
acggtagtactac
ctacttagacggtactacttag
acgt cgta taccBut what if…
What is the mechanism?Shortest Common Superstring
acggtagtactac
ctacttagacggtactacttag
acgt cgta taccBut what if…
What is the mechanism?Shortest Common Superstring
acggtagtactac
ctacttagacggtactacttag
acgtacc OR acgtcgtacc
acgt cgta taccBut what if…
What is the mechanism?Shortest Common Superstring
acggtagtactac
ctacttagacggtactacttag
Human genome: 3 billion bases
Reads: 30-100 bases
One run: hundreds of millions of reads
Will take too long
What can we do?
acgtacc OR acgtcgtacc
Another way: De Bruijn graphaagaca
Another way: De Bruijn graphaagacaaag
gacaga
aca
Another way: De Bruijn graphaagacaaag
gacaga
aca
aag
Another way: De Bruijn graphaagacaaag
gacaga
aca
aag aga
Another way: De Bruijn graphaagacaaag
gacaga
aca
aag aga gac
Another way: De Bruijn graphaagacaaag
gacaga
aca
aag aga acagac
Another way: De Bruijn graphaagacaaag
gacaga
aca
aag aga acagac
gacaagc
Another way: De Bruijn graphaagacaaag
gacaga
aca
aag aga acagac
gacaagcgac
caaaca
aagagc
Another way: De Bruijn graphaagacaaag
gacaga
aca
aag aga acagac
gacaagcgac
caaaca
aagagc
Another way: De Bruijn graphaagacaaag
gacaga
aca
aag aga acagac caa
gacaagcgac
caaaca
aagagc
Another way: De Bruijn graphaagacaaag
gacaga
aca
aag aga acagac caa
gacaagcgac
caaaca
aagagc
Another way: De Bruijn graphaagacaaag
gacaga
aca
aag aga acagac caa
agc
gacaagcgac
caaaca
aagagc
Another way: De Bruijn graphaagacaaag
gacaga
aca
aag aga acagac caa
agc
gacaagc
de Bruijn graph
gac
caaaca
aagagc
Another way: De Bruijn graphaagacaaag
gacaga
aca
aag aga acagac caa
agc
gacaagc
de Bruijn graph
gac
caaaca
aagagc
To recover DNA: make a trail that follows as many arrows as possible
Where should you start? Eulerian path(much faster) (repeats - OK)
Another way: De Bruijn graphaagacaaag
gacaga
aca
aag aga acagac caa
agc
gacaagc
Activity 3 and 4
de Bruijn graph
gac
caaaca
aagagc
To recover DNA: make a trail that follows as many arrows as possible
Where should you start? Eulerian path(much faster) (repeats - OK)
Activity 3
What did you get?
tagacgaacgtacggtagg
tag aga gac acg cga gaa aac
cgt
gta taccgg
ggt
agg
Activity 4
What did you get?
aca gag gcc cac aac atc ggc cac ctc g
What would make assembly easier?
What would make assembly easier?• More overlapping reads
What would make assembly easier?• More overlapping reads
What would make assembly easier?• More overlapping reads
What would make assembly easier?• More overlapping reads
• Longer reads — PacBio sequencing w/ 1000s of nucleotides
What would make assembly easier?• More overlapping reads
• Longer reads — PacBio sequencing w/ 1000s of nucleotides
What would make assembly easier?• More overlapping reads
• Longer reads — PacBio sequencing w/ 1000s of nucleotides
• Having something to compare against — reference genome
agaacgtgagagtgcgctacctc…
gaacgtgaga
What would make assembly easier?• More overlapping reads
• Longer reads — PacBio sequencing w/ 1000s of nucleotides
• Having something to compare against — reference genome
agaacgtgagagtgcgctacctc…gaacgtgaga
What would make assembly easier?• More overlapping reads
• Longer reads — PacBio sequencing w/ 1000s of nucleotides
• Having something to compare against — reference genome
honey bee
baker’s yeast
What can DNA tell?
Diseases
Resistance to colds
Short distance runner
Resistance to malaria
Other things• The longest genome?
• The shortest genome?
• How much does your DNA overlap w/ your neighbor’s?
• And with a mouse?..
• Explore the genes at UCSD Genome Browser
• Check out full assembled genomes at NIH
Other things• The longest genome?
• The shortest genome?
• How much does your DNA overlap w/ your neighbor’s?
• And with a mouse?..
• Explore the genes at UCSD Genome Browser
• Check out full assembled genomes at NIH
Amoeba dubia, over 200x larger
Other things• The longest genome?
• The shortest genome?
• How much does your DNA overlap w/ your neighbor’s?
• And with a mouse?..
• Explore the genes at UCSD Genome Browser
• Check out full assembled genomes at NIH
Amoeba dubia, over 200x larger
Candidatus Carsonella ruddii
Other things• The longest genome?
• The shortest genome?
• How much does your DNA overlap w/ your neighbor’s?
• And with a mouse?..
• Explore the genes at UCSD Genome Browser
• Check out full assembled genomes at NIH
Amoeba dubia, over 200x larger
Candidatus Carsonella ruddii
99.5% similar!
Other things• The longest genome?
• The shortest genome?
• How much does your DNA overlap w/ your neighbor’s?
• And with a mouse?..
• Explore the genes at UCSD Genome Browser
• Check out full assembled genomes at NIH
Amoeba dubia, over 200x larger
Candidatus Carsonella ruddii
99.5% similar!~92% similar!
1 Activity
Recover original text.
Dinah my dear! I wish you were
miss me very much to-night
hope they’ll remember her saucer of milk
down. There was nothing with me! There are no mice to-night, I should think! I hope they’ll
a bat, and that’s very like
There are no mice in the air, I’m afraiddown here with me! I’m afraid, but you might catch a bat,
milk at tea-time. Dinah you were down
I wonder.” very like, a mouse, you know. But
began talking again. “Dinah’ll miss me very
There was nothing else to do, so Alice
Alice soon began
But do cats eat bats, I wonder?”
Down, down
down, down.
Figure 1: Activity 1
2 Activity
Recover original DNA string from the DNA reads in your packets.
aagacaagcataacgggaaactatgcaaacta
acgggaaactatgcaaactaagaggggtagccccattacatttggg
tacatttgggtaaatgtaacattgctggctggatcctggg
tgctggctggatcctgggaaatccagagtgtgaatcactctcca
tcactctccacagcaagctcatggtcctacattgtggaaa
attgtggaaacatctagttcagacaatggaacgtgttacc
gttacccaggagatttcatcgattatgaggagctaagagagcaattgagct
gagatttcatcgattatgaggagctaagagagcaattgagctcagtgt
agctcagtgtcatcatttgaaaggtttgagatattcccca
atattccccaagacaagttcatggcccaatcatgactcga
tcatgactcgaacaaaggtgtaacggcagcatgtcctcat
atgtcctcatgctggagcaaaaggcttctacaaaaatttaatatggctagtatggctagttaaaaaaggaaattcatacccaaagctcagca
aaaggaaattcatacccaaagctcagcaaatcctacattaatgataaa
atcctacattaatgataaagggaaagaagtcctcgtgctatggggcattc
Figure 2: Activity 2: Recover DNA string from DNA reads
1
3 Activity
Draw a de Bruijn graph for string TAGACGAACGTACGGTAGG below.
2
4 Activity
Find a string that generated this graph: for this, use every edge exactly once.Hint: try to figure out which word is the start word and which word must be the last
word.
aca
caa aaccag aga gag
agg
ggc
gccccc
cca
cac
acc
cct
ctc
catatc
tcg
cgg
Figure 3: Activity 4
3
top related