sequencing a genome and basic sequence alignment

23
Sequencing a genome and Basic Sequence Alignment Lecture 11 1 Global Sequence

Upload: konala

Post on 22-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Sequencing a genome and Basic Sequence Alignment. Lecture 11. Introduction. Annotation of DNA sequences Sequence alignment and sequence matching Determining genomes sequences: Supplementary lecture the shot-gun approach. Annotation of sequences. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sequencing a genome and Basic Sequence Alignment

Global Sequence 1

Sequencing a genome and Basic Sequence Alignment

Lecture 11

Page 2: Sequencing a genome and Basic Sequence Alignment

Introduction

• Annotation of DNA sequences

• Sequence alignment and sequence matching

• Determining genomes sequences:– Supplementary lecture– the shot-gun approach

Page 3: Sequencing a genome and Basic Sequence Alignment

Global Sequence 3

Annotation of sequences• As discussed before when a gene sequence’s (DNA

and/or mRNA) have been determined (obtained) then the data must be annotated: (Klug 2010)– what sequences correspond UTR, exons/ introns,

coding sequences (cds), polyA signal– Other sequences of interest include: promoters sites

and other regulatory regions (enhancers…)

• Annotation also contains important supplementary material; other organisms that have the same gene; the corresponding protein sequence and journal articles related to the sequences….

Page 4: Sequencing a genome and Basic Sequence Alignment

Sequence similarity • In many cases of the annotation of gene sequence;

a sequence homology “test”, to existing sequences whose function is known, is performed.

• the assumption is that the both sequences were homologous [ have a common ancestor; were the same sequence] but are now different because of a series Mutations: substitution, deletions, insertions

• The basic concepts behind this process is sequence alignment and determining the strength of the match for the aligned sequence.

Page 5: Sequencing a genome and Basic Sequence Alignment

Global Sequence 5

Sequence Alignment ( Pair-wise) : A simple global match

• The assignment of residues-residue corresponds:– A Global match: align all of one

sequence with another .

– The figure shows to sequences of nucleic acids.

– Some have the same base (nucleic acid ) and so there is a match at this position between the strands. This is represented by a vertical line and a blue highlight.

– Others do not match and have no vertical line and no blue highlighThis figure adapted from Klug is a comparison

of a “leptin gene” from a dog (top) and a homo sapiens (bottom)

Page 6: Sequencing a genome and Basic Sequence Alignment

A simple global Match• The non matches are presumed to correspond to

mutations; in this case a substitution mutation.• In DNA (nucleic acids) mutations – A transition A <-> G is more probable than a

transversions T <-> C– The substitution mutation is more probable than

insertion/deletion. • The relative probability of such mutations has to

be taken into account when determining the strength of the match. (we will discuss this in greater detail later)

Page 7: Sequencing a genome and Basic Sequence Alignment

Global Sequence 7

Global sequence alignment: different size sequences • Example 1• I am from Cork• I am not from Cork• **** • (4 matches out of 18; based on

length of bottom string)

• Example 2• I am ---- from Cork• I am not from Cork• **** **********

• (14 matches out of 18; based on length of bottom string)

A Global alignment between sequence of difference sizes requires the inclusions of gaps [dash] in order to optimise the matching process: the gaps refer to insertion/deletion mutations

In Example 1 (only considers substitution mutations) produces a much lower number of matches than ( 4 out of 18)

Example 2 which considers all types of the 3 types of point mutations: it aligns the two sequneces to give the maximum matching score. This examples calculates a simple sequencing matching score; in DNA you would need to factor in the relative probability of mutations.

In amino acids the calculation is more complicated due to different types of mutations .

Page 8: Sequencing a genome and Basic Sequence Alignment

Global Sequence 8

An Example of DNA sequence alignment

Adapted from Klug p. 384 Determine the matching score.

Page 9: Sequencing a genome and Basic Sequence Alignment

Global Sequence 9

Sequence alignment: Amino Acids

• “*” match; “-” gap; “:” conserved substitution “.”semi-conserved substitution.

In DNA the sequence “itself” is most important; All nucleic acids have the “same” basics properties.

However amino acid sequences produce a 3-D structure, which relates to the property of amino acids in the sequence. So the properties of the amino acids is important

Amino Acids with similar, side chain, properties will have overlapping “effects” on 3-D structure of the protein.

The above figure takes this into account by referring to two types of substitutions: conserved and semi-conserved substitutions

Page 10: Sequencing a genome and Basic Sequence Alignment

Global Sequence 10

Sequence Alignment: a local MatchA simple local Match example

• Find a region in one sequence that matches a region in the other.

• A local match is generally used if there is a larger difference in size between the sequences

• The overhangs at the beginning and end of the query string are not treated as gaps. Since the local is only being testing for its alignment of a sub-sequence of the global

• In the example – A global (alignment) gives a score of 9 out of 13;– A Local (alignment) gives a score is 8 out of 10

( do not count overhangs…)– In general the Alignment with the highest score

is the one that is taken.

Page 11: Sequencing a genome and Basic Sequence Alignment

Global Sequence 11

Sequence Alignment: pairwise : a motif match

• A motif match can find:• a “perfect match between a small

sequence of one or more regions in a larger sequence.

• This plays an important part in looking for repeating sequences [tandem repeats] , and important other “small” sequences;

• The motif match like the others of course does not have to be “contiguous ; it can also include a conserved distributed pattern. (understanding bioinformatics chapter 4 87-93)

• You are not from Cork• You are not normal• They are not happy

about… • *** ***

Page 12: Sequencing a genome and Basic Sequence Alignment

Global Sequence 12

Multiple sequence alignment• Similar to the previous except you

look for areas conserved between all the sequences in the alignment:

• My name is denis and I am from cork• My name is kieran and I am not from cork• We name the dog “canis familiaris”

• name

• used to align multiple sequences which can be used to check for conserved motifs/sequences in many species: used to determine protein functionality, promoter signals, enhancer and silencer regions….

• Multiple alignment can also determine phylogenetic relationships. (evolution: refer to understanding bioinformatics chapter 7)

Page 13: Sequencing a genome and Basic Sequence Alignment

Global Sequence 13

GENOMES: Sequencing and assembling • The supplementary lecture covers how to

produce and determine the sequence of DNA strands. However, the size of the Strands are limited to a few 1000 base pairs.

• To sequence an organism’ s entire genome : – Must use the “shot gun” approach– Cut the genome into small fragments whose

sequence can be determined.– use computational techniques (sequence alignment)

to join them back together in the correct order

Page 14: Sequencing a genome and Basic Sequence Alignment

Shot-gun • Shot gun approach requires:– Two genetic technologies: (refer to supplementary

material for more detail)• Restriction enzymes: cut up denatured (ss)DNA • Fast DNA sequencing of fragments (sequences)

– one computational technique (overlapping contigs ) :• Combining overlapping contiguous DNA sequences

Page 15: Sequencing a genome and Basic Sequence Alignment

Global Sequence 15

Example of Contigs alignment:

The above diagram shows an DNA example of how overlapping contiguous sequences are aligned. However it is an oversimplification as actual segments are many times larger than shown and overlapping does not always happen at then end of ends of segments. Adapted from: Klug 7th p 378

Page 16: Sequencing a genome and Basic Sequence Alignment

Global Sequence 16

Overlapping Contiguous Fragments

Adapted from [1] p. 377

Page 17: Sequencing a genome and Basic Sequence Alignment

Global Sequence 17

Overlapping Fragments: example

• Original sentence: • This is the school of computing bioinformatics course.

• Cut “2” copies of the sentence into “random sized” fragments

• This is • The school of • Computing bioinformatics course

• This is the • School of computing • Bioinformatics course

Page 18: Sequencing a genome and Basic Sequence Alignment

Global Sequence 18

Overlapping Fragments: example • Check for overlaps (prefix and suffix)• This is• This is the• The school of • School of computing• computing bioinformatics course

• Bioinformatics course

• Result of alignment of fragments is: – This is the school of computing bioinformatics course

Page 19: Sequencing a genome and Basic Sequence Alignment

Algorithm to join contigs

• we need two relationships between fragments:#

• (1) which fragment shares no prefix with suffix of another fragment# (This tells us which fragment comes first)

• (2) which fragment shares longest suffix with a prefix of another# (This tells us which fragment follows any fragment)

Page 20: Sequencing a genome and Basic Sequence Alignment

Example 2: • Reconstruct the following fragments

1. the men and women merely players;\none2. man in his time 3. All the world's4. their entrances,\nand one man5. a stage,\nAnd all the men and women6. They have their exits and their entrances,\n7. world's a stage,\nAnd all8. their entrances,\nand one man9. in his time plays many parts.10. merely players;\nThey have

Page 21: Sequencing a genome and Basic Sequence Alignment

Example 2 Solution

• all the world’s a stage, • And all the men and women merely players; • They have their exits and their entrances• And one man plays many parts

• Order of statements joining together are:• 3,7,5,1,10, 6,4,8,2,9

Page 22: Sequencing a genome and Basic Sequence Alignment

Example 2 Solution in detail. 1. the men and women merely players;(\n)2. one man in his time 3. All the world's 4. their entrances,(\n) and one man 5. stage, (\n) And all the men and women6. They have their exits and their entrances,(\n)7. world's a stage, (\n) And all 8. their entrances, (\n) and one man9. in his time plays many parts.10. merely players; (\n) They have

• 6: all the world’s a stage, • And all the men and women merely players; • They have their exits and their entrances• 4: all the world’s a stage, • And all the men and women merely players; • They have their exits and their entrances• And one man• 8: all the world’s a stage, • And all the men and women merely players; • They have their exits and their entrances• And one man• 2: all the world’s a stage, • And all the men and women merely players; • They have their exits and their entrances• And one man in his time• 9: all the world’s a stage, • And all the men and women merely players; • They have their exits and their entrances• And one man plays many parts

Solution Part 1:Order of the statements3: all the world’s,7: all the world’s a stage, And all 5: all the world’s a stage, And all the men and women1: all the world’s a stage, And all the men and women merely players;10: all the world’s a stage, And all the men and women merely players; They have

Page 23: Sequencing a genome and Basic Sequence Alignment

Potential Exam question

• Briefly describe the three main types of sequence alignment (6 marks)

• Explain how would determine the DNA sequence of a genome given that technology can only determine the DNA sequences of relatively small DNA strands (14 marks).

• Explain, two important elements, of an algorithm that can solve the problem. (10 marks)