dna bytes summary creative technology nights - 16 february ... · dna bytes summary – creative...

Post on 14-Jun-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DNA Bytes summary – Creative Technology Nights - 16 February 2015

Why do some people have blonde hair and some -- dark hair? Why some of us have blue

eyes and some -- brown or green? It is all in our DNA and genes that it contains! In the

"DNA Bytes" session, we are going to talk about what DNA is, where to find it and, most

importantly, DNA's sequence.

One of the problems with getting DNA's sequence is that it only comes in short pieces

instead of one long sequence. Think of it as a book where all the words get moved around

and now you need to reconstruct the book's text to read the story. Similarly, scientists

have to stitch DNA pieces together like a big puzzle before they can study our genes.

We will discuss different strategies for stitching DNA sequence -- a process called

genome assembly -- and how computer scientists helped figure out ways to do it quickly

and unambiguously.

DNA Bytes!

By Darya F. & Shalyn G. for TechNights Feb 16, 2015

No actual biting will take place.

Who has blue eyes?

Who has blue eyes?• Why do we have different color eyes?

Who has blue eyes?• Why do we have different color eyes?

• Why do we have different hair?

Who has blue eyes?• Why do we have different color eyes?

• Why do we have different hair?

• It’s in the genes in our DNA!

What is DNA?

What is DNA?• A very long molecule that

carries our genetic information

What is DNA?• A very long molecule that

carries our genetic information

• Where is it?

What is DNA?• A very long molecule that

carries our genetic information

• Where is it?

What is DNA?• A very long molecule that

carries our genetic information

• Where is it? cell nucleus

What is DNA?• A very long molecule that

carries our genetic information

• Where is it?

• How long is it?

cell nucleus

What is DNA?• A very long molecule that

carries our genetic information

• Where is it?

• How long is it? ~2m (6.5 ft)

cell nucleus

What is DNA?• A very long molecule that

carries our genetic information

• Where is it?

• How long is it?

• How does it look?

~2m (6.5 ft)

cell nucleus

(zoom in)

(coiled tightly in the nucleus)

What is DNA?• A very long molecule that

carries our genetic information

• Where is it?

• How long is it?

• How does it look?

~2m (6.5 ft)

cell nucleus

DNA structure

DNA structureWhat is this structure called?

DNA structureDouble helix

What is this structure called?

DNA structureDouble helix

What is this structure called?What is connecting helices?

DNA structureDouble helixBases What is this structure called?

What is connecting helices?

DNA structureDouble helixBases What is this structure called?

What are the different bases?

What is connecting helices?

DNA structureDouble helixBases

ACGT

What is this structure called?

What are the different bases?

What is connecting helices?

DNA structureDouble helixBases

denineytosineuaninehiamine

ACGT

What is this structure called?

What are the different bases?

What is connecting helices?

DNA structureDouble helixBases

denineytosineuaninehiamine

ACGT

What is this structure called?

What are the different bases?

What is connecting helices?

Any patterns?

DNA structureDouble helixBases

denineytosineuaninehiamine

ACGT

C G

T A

What is this structure called?

What are the different bases?

What is connecting helices?

Any patterns?

DNA structureDouble helixBases

denineytosineuaninehiamine

ACGT

GTA

CACC

CAGT

C G

T A

What is this structure called?

What are the different bases?

What is connecting helices?

Any patterns?

DNA structureDouble helixBases

denineytosineuaninehiamine

ACGT

GTA

CACC

CAGT

DNA sequence

C G

T A

What is this structure called?

What are the different bases?

What is connecting helices?

Any patterns?

How can we get DNA sequence?

How can we get DNA sequence?Many copies of the same

DNA

Shear each DNA strand, randomly breaking it into many small pieces:

original

+lots of cool biology & chemistry & physics…

slide courtesy of C. Kingsford

How can we get DNA sequence?Many copies of the same

DNA

Shear each DNA strand, randomly breaking it into many small pieces:

original

+lots of cool biology & chemistry & physics…

slide courtesy of C. Kingsford

How can we get DNA bases?

ACGTTTC

ACGGATC

TGGTTTC

ACGTCGCACGTATC

CGGTACT

ACGTTTC

ACGGATC

TGGTTTC

ACGTCGCACGTATC

CGGTACT

ACGTTTC

ACGGATC

TGGTTTC

ACGTTGC

ACGTATC

CGGTACT

This process is called genome sequencing

sequencer

random order

DNA pieces

more cool biology, chemistry,

and physics!!!

READS

We can only read ~ 100 characters at a time from a random place:

Activity 1 — Alice in Wonderland

miss me very much to-night

hope they’ll remember her saucer of milk

down. There was nothing with me! There are no mice to-night, I should think! I hope they’ll

a bat, and that’s very like

There are no mice in the air, I’m afraiddown here with me! I’m afraid, but you might catch a bat,

Alice soon began

down, down.

slide courtesy of C. Kingsford

What did you get?

What did you get?Down, down, down. There was nothing else to do, so Alice soon began talking again. `Dinah'll miss me very much to-night, I should think! I hope they'll remember her saucer of milk at tea-time. Dinah my dear! I wish you were down here with me! There are no mice in the air, I'm afraid, but you might catch a bat, and that's very like a mouse, you know. But do cats eat bats, I wonder?

What did you get?Down, down, down. There was nothing else to do, so Alice soon began talking again. `Dinah'll miss me very much to-night, I should think! I hope they'll remember her saucer of milk at tea-time. Dinah my dear! I wish you were down here with me! There are no mice in the air, I'm afraid, but you might catch a bat, and that's very like a mouse, you know. But do cats eat bats, I wonder?

Now try it with DNA bases :)

What did you get?Down, down, down. There was nothing else to do, so Alice soon began talking again. `Dinah'll miss me very much to-night, I should think! I hope they'll remember her saucer of milk at tea-time. Dinah my dear! I wish you were down here with me! There are no mice in the air, I'm afraid, but you might catch a bat, and that's very like a mouse, you know. But do cats eat bats, I wonder?

Now try it with DNA bases :)

Activity 2

What did you get?

What did you get?

13 5 11 2 10 1 6 9 8 4 3 7 12 14 15

What did you get?

13 5 11 2 10 1 6 9 8 4 3 7 12 14 15

easy/hard?:) :(

What did you get?

13 5 11 2 10 1 6 9 8 4 3 7 12 14 15

This process is called genome assembly

easy/hard?:) :(

What is the mechanism?

What is the mechanism?Shortest Common Superstring

acggta gtactacctacttag

What is the mechanism?Shortest Common Superstring

acggtagtactac ctacttag

What is the mechanism?Shortest Common Superstring

acggtagtactac

ctacttag

What is the mechanism?Shortest Common Superstring

acggtagtactac

ctacttagacggtactacttag

acgt cgta taccBut what if…

What is the mechanism?Shortest Common Superstring

acggtagtactac

ctacttagacggtactacttag

acgt cgta taccBut what if…

What is the mechanism?Shortest Common Superstring

acggtagtactac

ctacttagacggtactacttag

acgtacc OR acgtcgtacc

acgt cgta taccBut what if…

What is the mechanism?Shortest Common Superstring

acggtagtactac

ctacttagacggtactacttag

Human genome: 3 billion bases

Reads: 30-100 bases

One run: hundreds of millions of reads

Will take too long

What can we do?

acgtacc OR acgtcgtacc

Another way: De Bruijn graphaagaca

Another way: De Bruijn graphaagacaaag

gacaga

aca

Another way: De Bruijn graphaagacaaag

gacaga

aca

aag

Another way: De Bruijn graphaagacaaag

gacaga

aca

aag aga

Another way: De Bruijn graphaagacaaag

gacaga

aca

aag aga gac

Another way: De Bruijn graphaagacaaag

gacaga

aca

aag aga acagac

Another way: De Bruijn graphaagacaaag

gacaga

aca

aag aga acagac

gacaagc

Another way: De Bruijn graphaagacaaag

gacaga

aca

aag aga acagac

gacaagcgac

caaaca

aagagc

Another way: De Bruijn graphaagacaaag

gacaga

aca

aag aga acagac

gacaagcgac

caaaca

aagagc

Another way: De Bruijn graphaagacaaag

gacaga

aca

aag aga acagac caa

gacaagcgac

caaaca

aagagc

Another way: De Bruijn graphaagacaaag

gacaga

aca

aag aga acagac caa

gacaagcgac

caaaca

aagagc

Another way: De Bruijn graphaagacaaag

gacaga

aca

aag aga acagac caa

agc

gacaagcgac

caaaca

aagagc

Another way: De Bruijn graphaagacaaag

gacaga

aca

aag aga acagac caa

agc

gacaagc

de Bruijn graph

gac

caaaca

aagagc

Another way: De Bruijn graphaagacaaag

gacaga

aca

aag aga acagac caa

agc

gacaagc

de Bruijn graph

gac

caaaca

aagagc

To recover DNA: make a trail that follows as many arrows as possible

Where should you start? Eulerian path(much faster) (repeats - OK)

Another way: De Bruijn graphaagacaaag

gacaga

aca

aag aga acagac caa

agc

gacaagc

Activity 3 and 4

de Bruijn graph

gac

caaaca

aagagc

To recover DNA: make a trail that follows as many arrows as possible

Where should you start? Eulerian path(much faster) (repeats - OK)

Activity 3

What did you get?

tagacgaacgtacggtagg

tag aga gac acg cga gaa aac

cgt

gta taccgg

ggt

agg

Activity 4

What did you get?

aca gag gcc cac aac atc ggc cac ctc g

What would make assembly easier?

What would make assembly easier?• More overlapping reads

What would make assembly easier?• More overlapping reads

What would make assembly easier?• More overlapping reads

What would make assembly easier?• More overlapping reads

• Longer reads — PacBio sequencing w/ 1000s of nucleotides

What would make assembly easier?• More overlapping reads

• Longer reads — PacBio sequencing w/ 1000s of nucleotides

What would make assembly easier?• More overlapping reads

• Longer reads — PacBio sequencing w/ 1000s of nucleotides

• Having something to compare against — reference genome

agaacgtgagagtgcgctacctc…

gaacgtgaga

What would make assembly easier?• More overlapping reads

• Longer reads — PacBio sequencing w/ 1000s of nucleotides

• Having something to compare against — reference genome

agaacgtgagagtgcgctacctc…gaacgtgaga

What would make assembly easier?• More overlapping reads

• Longer reads — PacBio sequencing w/ 1000s of nucleotides

• Having something to compare against — reference genome

honey bee

baker’s yeast

What can DNA tell?

Diseases

Resistance to colds

Short distance runner

Resistance to malaria

Other things• The longest genome?

• The shortest genome?

• How much does your DNA overlap w/ your neighbor’s?

• And with a mouse?..

• Explore the genes at UCSD Genome Browser

• Check out full assembled genomes at NIH

Other things• The longest genome?

• The shortest genome?

• How much does your DNA overlap w/ your neighbor’s?

• And with a mouse?..

• Explore the genes at UCSD Genome Browser

• Check out full assembled genomes at NIH

Amoeba dubia, over 200x larger

Other things• The longest genome?

• The shortest genome?

• How much does your DNA overlap w/ your neighbor’s?

• And with a mouse?..

• Explore the genes at UCSD Genome Browser

• Check out full assembled genomes at NIH

Amoeba dubia, over 200x larger

Candidatus Carsonella ruddii

Other things• The longest genome?

• The shortest genome?

• How much does your DNA overlap w/ your neighbor’s?

• And with a mouse?..

• Explore the genes at UCSD Genome Browser

• Check out full assembled genomes at NIH

Amoeba dubia, over 200x larger

Candidatus Carsonella ruddii

99.5% similar!

Other things• The longest genome?

• The shortest genome?

• How much does your DNA overlap w/ your neighbor’s?

• And with a mouse?..

• Explore the genes at UCSD Genome Browser

• Check out full assembled genomes at NIH

Amoeba dubia, over 200x larger

Candidatus Carsonella ruddii

99.5% similar!~92% similar!

1 Activity

Recover original text.

Dinah my dear! I wish you were

miss me very much to-night

hope they’ll remember her saucer of milk

down. There was nothing with me! There are no mice to-night, I should think! I hope they’ll

a bat, and that’s very like

There are no mice in the air, I’m afraiddown here with me! I’m afraid, but you might catch a bat,

milk at tea-time. Dinah you were down

I wonder.” very like, a mouse, you know. But

began talking again. “Dinah’ll miss me very

There was nothing else to do, so Alice

Alice soon began

But do cats eat bats, I wonder?”

Down, down

down, down.

Figure 1: Activity 1

2 Activity

Recover original DNA string from the DNA reads in your packets.

aagacaagcataacgggaaactatgcaaacta

acgggaaactatgcaaactaagaggggtagccccattacatttggg

tacatttgggtaaatgtaacattgctggctggatcctggg

tgctggctggatcctgggaaatccagagtgtgaatcactctcca

tcactctccacagcaagctcatggtcctacattgtggaaa

attgtggaaacatctagttcagacaatggaacgtgttacc

gttacccaggagatttcatcgattatgaggagctaagagagcaattgagct

gagatttcatcgattatgaggagctaagagagcaattgagctcagtgt

agctcagtgtcatcatttgaaaggtttgagatattcccca

atattccccaagacaagttcatggcccaatcatgactcga

tcatgactcgaacaaaggtgtaacggcagcatgtcctcat

atgtcctcatgctggagcaaaaggcttctacaaaaatttaatatggctagtatggctagttaaaaaaggaaattcatacccaaagctcagca

aaaggaaattcatacccaaagctcagcaaatcctacattaatgataaa

atcctacattaatgataaagggaaagaagtcctcgtgctatggggcattc

Figure 2: Activity 2: Recover DNA string from DNA reads

1

3 Activity

Draw a de Bruijn graph for string TAGACGAACGTACGGTAGG below.

2

4 Activity

Find a string that generated this graph: for this, use every edge exactly once.Hint: try to figure out which word is the start word and which word must be the last

word.

aca

caa aaccag aga gag

agg

ggc

gccccc

cca

cac

acc

cct

ctc

catatc

tcg

cgg

Figure 3: Activity 4

3

top related