sequencing a genome and basic sequence alignment lecture 8 1global sequence

21
Sequencing a genome and Basic Sequence Alignment Lecture 8 1 Global Sequence

Upload: maude-dixon

Post on 22-Dec-2015

230 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 1

Sequencing a genome and Basic Sequence Alignment

Lecture 8

Page 2: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Introduction

• Determining DNA sequences

• Discovering genomes the shot-gun approach

• Sequence alignment (sequence matching)

Page 3: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 3

Annotation of sequences• As discussed before when the gene sequence’s

(DNA and/or mRNA) have been determined (obtained) then the data must be annotated: (Klug 2010)– what sequences correspond UTR, exons/ introns,

coding sequences (cds), polyA signal– Other sequences of interest include: promoters sites

and other regulatory regions (enhancers…)

• Annotation also contains important supplementary material; other organisms that have the same gene; the corresponding protein sequence and journal articles related to the sequences….

Page 4: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 4

How are sequences of genes and genomes obtained

• DNA recombinant technology is essential to produce DNA sequences that can be used to determine [chapter 17 (klug 2010)]:– sequences in genes,– regulatory sequences – large DNA strands.

• Some of the important terms in this field include:– Cloning DNA: making copies of DNA.

– Restriction enzymes: cuts DNA at specific sites : vary in size from sites of 4bp to 8bp or longer; 4 bp cuts into fragments of 256 bp in size ; of 8 b.p 4 8 (64,000) b.p. ; e.g. EcoR1 site: GAATTC

– Restriction maps: map of restriction enzyme sites (refer to figure 17.5 klug)

Page 5: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

DNA recombinant technology– Plasmid Vectors: help insert the DNA fragment that needs cloned into a host

cell. Inside the host cell both the vector and the DNA fragment are cloned (copied). In the example a DNA fragment is inserted into the plasmid. The plasmid is then inserted into the host cells and produces many copies of itself.

– The LacZ gene is used as a marker. If markers is disrupted then it means that the host cell has a plasmid vector (recombinant plasmid) in it

Page 6: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Sequencing DNA strands• dATP is an adenine base nucleic

acid• ddATP is a modified adenine base

which has a coloured florescent marker attached. In has the added property of terminating the elongation if chosen instead of dATP

• During the process all possible lengths of chain are produced.

• Lengths are separated based on weight and analysed to give

• The complementary sequence of the template strand. [ note the sequences in part 1 and part4]

Page 7: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Expressed sequence tags

• Refer to box 9.1 understanding bioinformatics

Page 8: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 8

GENOMES: Sequencing and assembling • Plasmids and other recombinant DNA technology

only produce relatively small DNA segments.• To sequence an organism’ s entire genome : – Must use the “shot gun” approach

• Shot gun approach requires two genetic technologies and one computational technique:– Restriction enzymes: cut up denatured (ss)DNA – Fast DNA sequencing of fragments (sequences)– Combining overlapping contiguous DNA sequences

Page 9: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 9

Overlapping Contiguous Fragments

Adapted from [1] p. 377

Page 10: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 10

Overlapping Fragments: example

• Original sentence: • This is DT228 bioinformatics course.

• Cut 2 copies of the sentence into fragmentes• This is • course • DT228 bioinformatics

• This is DT228 • Bioinformatics course

Page 11: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 11

Overlapping Fragments: example

• Check for overlaps (prefix and suffix)• This is• This is DT228• DT228 bioinformatics• bioinformatics course• course• Result of alignment of fragments is: – This is DT228 bioinformatics course

Page 12: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 12

Overlapping Fragments: example • Reconstruct the sentence from the following 2 sentences (same as the original) which have been

randomly fragmented.

– molto questa lingua.– mondo. ho dodici anni e– sono nato nel posto– inglese e mi piace– migliore nel – parlo io un pochino – – – nel mondo. Ho dodici anni e parlo– piace molto questa lingua.– nel posto migliore– un pochino inglese e mi – sono nato

• Solution will be discussed in class

Page 13: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 13

Example of Contigs alignment:

The above diagram shows an DNA example of how overlapping contiguous sequences are aligned. However it is an oversimplification as actual segments are many times larger than shown and overlapping does not always happen at then end of ends of segments. Adapted from: Klug 7th p 378

Page 14: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 14

Sequence Alignment ( Pair-wise) : A simple global match

The assignment of residues-residue corresponds:A Global match: align all of one sequence with another .

The figure shows to sequences of nucleic acids.

Some have the same base (nucleic acid ) and so there is a match at this position between the strands. This is represented by a vertical line and a blue highlight.

Others do not match and have no vertical line and blue highlight: these are unmatched pairs and correspond to substitutions . In DNA nucleic acids transitions A > G and T> C are the most common than transversions

This figure adapted from Klug is a comparison of a “leptin gene” from a dog (top) and a homo sapiens (bottom)

This technique of global alignment matching is important in the area of: Comparative genomics, homologous gene analysis and the development of evolutionary trees.

Page 15: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 15

Global alignment: different size sequences • Example 1• I am from Cork• I am not from Cork• **** • (4 matches out of 18; based on

length of bottom string)

• Example 2• I am ---- from Cork• I am not from Cork• **** **********

• (14 matches out of 18; based on length of bottom string)

A Global alignment between sequence of difference sizes requires the inclusions of gaps [dash] in order to optimise the matching process.

Example 1 with not inclusion of gaps produces a much lower number of matches than example 2 which includes dashes.

the assumption is that the both strands are homologous [ have a common ancestor; were the same sequence] but are now different through a series substitution [mismatch] , Deletions /insertions [gaps]

Page 16: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 16

Example of alignment Nucleic acids

tAdapted from Klug p. 384

Page 17: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 17

Sequence alignment: Amino Acids

• “*” match; “-” gap; “:” conserved substitution “.”semi-conserved substitution.

In DNA the sequence is most important in relation to its functionality however in proteins its final structure is most significant; while it relates to the sequence but also to: The property of amino acids plays a significant part in the final configuration (refer to lecture 3 slide 5).

Amino Acids with similar properties /structure will have overlapping “effects” on the final 3-D structure of the protein.

Therefore the type of substitutions must be extended to included this and so you can have conserved and semi-conserved substitutions

Page 18: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 18

Sequence Alignment: pairwise : a local Match

A local Match :

• find a region in one sequence that matches a region of another overhangs at the end are not treated as gaps

• A local match is generally used if there is a larger difference in size between the sequences

• In example – global Scores is 9 out of 13;– Local score is 8 out of 10 ( no

overhangs…)

Example

Page 19: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 19

Sequence Alignment: pairwise : a motif match

Motif (small region) match• A motif match can find:• a “perfect match between a small

sequence and one or more regions in a larger sequence.

• This plays an important part in looking for repeating sequences [tandem repeats] , and other “relatively small” regions that may be conserved between organisms

• The motif match like the others of course does not have to be “perfect” can include deletion/insertions

example

• You are not from Cork• You are not normal• *** ***

Page 20: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Global Sequence 20

Multiple alignment: many sequences• Similar to the previous except

you look for areas conserved between all the sequences in the alignment:

• My name is denis and I am from cork• My name is kieran and I am not from cork• We name the dog “canis familiaris”

• name

• Programs like clustaW are used to align multiple sequences which can be used to check for conserved motifs/sequences in many species: used to determine phylogenetic relationships and protein functionality

Page 21: Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence

Exam Questions• Explain, using suitable examples, the “shot- gun”

genomic alignment approach and why it has become the dominant method for analysing genomes.

• DNA Sequences alignment matching can take a number of forms; describe the different types of matching.

• Explain how the different types of point mutations are incorporated into the sequence alignment matching process.

• Discuss why the inclusions of “gaps” in to a matching alignment increase the degree of matching and explain what these gaps mean and what it means for the aligned sequences.