bio info 5

65
Bioinformatics Lecture# 5 Dr. Naeem Ud Din Khattak Professor Department of Zoology Islamia College Peshawar (Chartered University)

Upload: abidicup

Post on 11-May-2015

731 views

Category:

Education


4 download

DESCRIPTION

bioinformatics

TRANSCRIPT

Page 1: Bio info 5

Bioinformatics Lecture# 5Dr. Naeem Ud Din Khattak

ProfessorDepartment of Zoology

Islamia College Peshawar (Chartered University)

Page 2: Bio info 5

Phylogenetic Tree Construction

Page 3: Bio info 5

3

• The mutation distance : The minimal number of nucleotides that would need to be altered in order for the gene for one Protein to code for the other.

• ACTGAT A C T G A T - T C T - A T C TCTATC

Page 4: Bio info 5

The construction of the tree

4

• Assume proteins, A, B and C, and their mutation distances.

• There are two Qs:

1. Which pair does one join together first?

2. What are the lengths of edges a, b, and c?

B CA 24 28B 32

Page 5: Bio info 5

Which pair does one join together first ?

5

• It is simply by choosing the pair with the smallest mutation distance.

B CA 24 28B 32 A B C

Page 6: Bio info 5

What are the lengths of legs a, b, and c?

6

B CA 24 28B 32

a+b=24 a+c=28b+c=32

a =10b =14c =18

A B C

a b

c

a =?b =?c =?

Page 7: Bio info 5

• i. a+b=24 ii. a+c=28 iii. b+c=32

• a+b=24 : a=24-b put the value of a in ii : • 24-b+c=28 ; c-b=28-24; c-b=4 : c=4+b

• put value of c in iii. b+4+b=32 : 2b+4=32: 2b=32-4;

• b=28/2=14• Now put the value of b in 1

Page 8: Bio info 5

• Note that this analysis assumes that there are no multiple substitutions|||||||||||||||when a single site undergoes two or more changes e.g. the ancestral sequence … ATGT … gives … AGGT …

• and … ACGT …).

Page 9: Bio info 5

Based on lectures by C-B Stewart, and by Tal Pupko

Ancestral Node or ROOT of

the TreeInternal Nodes or

Divergence Points (represent hypothetical ancestors of the

taxa)

Branches or Lineages

Terminal Nodes

A

B

C

D

E

Represent theTAXA (genes,populations,species, etc.)used to inferthe phylogeny

Phylogenetic Tree Terminology

Page 10: Bio info 5

Based on lectures by C-B Stewart, and by Tal Pupko

Phylogenetic trees diagram the evolutionary relationships between the taxa

((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses

Taxon A

Taxon BTaxon C

Taxon ETaxon D

Page 11: Bio info 5

Based on lectures by C-B Stewart, and by Tal Pupko

((A,(B,C)),(D,E))

Taxon A

Taxon B

Taxon C

Taxon E

Taxon D

__ B and C are more closely related to each other than either is to A, ___ and A, B, and C form a clade that is a sister group to the clade composed of D and E. ____If the tree has a time scale, then D and E are the most closely related.

clade

Clade

Page 12: Bio info 5

Sequence Comparisons

Page 13: Bio info 5

• Nature acts conservatively, i.e., it does not develop a new kind of biology for every life form but continuously changes and adapts a proven general concept.

• Novel functionalities do not appear because a new gene has suddenly arisen but are developed and modified during evolution.

• Thus, Alleles of a gene found in a population arise from a common ancestor gene_____________ HOMOLOGOUS

Page 14: Bio info 5

Homology is not a measure of similarity, but rather that sequences have a shared evolutionary history and, therefore, possess a common ancestral sequence

(Tatusovet al. 1997).• An all or none phenomenon

Page 15: Bio info 5

Orthologs• Homologous proteins from different

species that possess the same function (e.g., corresponding kinases in a signal transduction pathway in humans and mice) are called orthologs.

Paralogs • Homologous proteins that have

different functions in the same species (e.g., two kinases in different signal transduction pathways of humans) are termed paralogs.

lotus
SEE ANIMATION FOR KINASES
Page 16: Bio info 5

• A visual representation of orthologs (and some other commonly confused terms, paralogs and homologs)

Page 17: Bio info 5

Orthologs: "genes that have diverged after a speciation event... [that] tend to have similar function" (Fulton et al. 2006). Thus, orthologs are genes whose encoded proteins fulfill similar roles in different species.

Page 18: Bio info 5

• Homology is not quantifiable –

• The similarity and Identity of two sequences, however

IS

Page 19: Bio info 5

Identity • ratio of the

number of identical amino acids or nucleotides relative to the total number of amino acids or nucleotides.

4/20 = 0.2.

Page 20: Bio info 5

similarity• Unlike identity, similarity is not as simple to

calculate. Before similarity can be determined, it must first be defined how similar the building blocks of sequences are to each other.

• This is done with the help of similarity matrices _____ specify the probability at which a sequence transforms into another sequence over time.

• dependent on the time and the mutational rate of nucleotides.

Page 21: Bio info 5

• For nucleotide sequences the simplest solution is an identity matrix ( Fig. 4.2a).

Page 22: Bio info 5

• For protein sequences, an identity matrix is not sufficient to describe biological and evolutionary

processes.• Amino acids are not exchanged with the same

probability as might be conceived theoretically.

• YOU CAN RECALL THE SYNONYMOUS AND NON-SYNONYMOUS MUTATIONS

Page 23: Bio info 5

• For example, • an exchange of

aspartic acid for glutamic acid is frequently observed;

• aspartic acid to tryptophan is seen rarely.

T in DNA

DNA T

Page 24: Bio info 5

• A second reason for the mutation of aspartic acid- to- glutamic acid

to occur more often is that both have similar properties.

• In contrast aspartic acid and tryptophan are chemically different – the hydrophobic tryptophan is frequently found in the center of proteins, whereas the hydrophilic aspartic acid occurs more often at the surface.

Page 25: Bio info 5

• Amino acid substitution matrices, therefore, describe the probability at which amino acids are exchanged in the course of evolution.

• The most commonly used amino acid scoring matrices are the

PAM (Position Accepted Mutation; Dayhoff et al.

1978) and BLOSUM groups• (Blocks Substitution Matrix; Henikoff and

Henikoff 1992)

Page 26: Bio info 5

Tryptophan Trp W Hydrophobic

aspartic acid Asp D

Glutamic acid GluHydrophilic

E

Electrically Charged (negative)

Page 27: Bio info 5

NUCLEOTIDE AND AMINO ACID SEQUENCES ARE EVOLUTIONARILY DIFFERENT

SO,WE NEED DIFFERENT CRITERIA AND MATRICES TO ANALYZE THEM

Page 28: Bio info 5

• ( Fig. 4.2 a)

• For nucleotide sequences the simplest solution is an identity matrix

Page 29: Bio info 5

Score: 65 Score: 19

( Fig. 4.2 b) For Amino Acid Seqs We need Similarity Matrices

Page 30: Bio info 5

Calculation of a global alignment of two similar protein sequences.

Page 31: Bio info 5

Calculation of a global alignment of two similar protein Sequences

Page 32: Bio info 5

Identity • ratio of the

number of identical amino acids or nucleotides relative to the total number of amino acids or nucleotides.

4/20 = 0.2.

Page 33: Bio info 5

Identity • ratio of the

number of identical amino acids or nucleotides relative to the total number of amino acids or nucleotides.

4/20 = 0.2.

Page 34: Bio info 5

• Using MEGA to Calculate Mutation Distance

Page 35: Bio info 5

Outgroup to root a phylogenetic tree

• The tree of human, chimpanzee, gorilla and orangutan genes is rooted with a baboon gene because

• we know from the fossil record that the common ancestor of the four species split away from baboon earlier in geological time

• Let’s See Members of this Group

Page 36: Bio info 5

Outgroup

Chimp

Human

Gorilla

Orangutan

Baboon0.11487

0.02920

0.03257

0.03604

0.06993

0.04494

0.00993

0.00997

0.02

Chimp

Human

Gorilla

Orangutan

0.02893

0.03163

0.03631

0.06338

0.01087

0.01621

0.01

Page 37: Bio info 5

Kiwi Ostrich Swan Ring Necked Phaesant Silver phaesant song sparrow Parrot Lizzard

0.03410

0.05269

0.08525

0.03263

0.21419

0.02508

0.02360

0.08454

0.065700.06932

0.01663

0.03150

0.01503

0.00881

Outgroup

Page 38: Bio info 5

Kiwi

Struthio camelus

Swan

song sparrow

Ring nicked Phaesant

Silver pheasant

Parrot

0.02885

0.04909

0.10274

0.02885

0.02885

0.02885

0.07554

0.02023

0.05367

0.02645

0.00699

0.02021

The Design of the phylogenetic TREE does not change the evolutionary distance among the various taxa represented.

Page 39: Bio info 5

Kiwi

Struthio camelus

Swan

song sparrow

Ring nicked Phaesant

Silver pheasant

Parrot

0.02885

0.04909

0.10274

0.028850.02885

0.02885

0.07554

0.02023

0.05367

0.02645

0.00699

0.02021

The Design of the phylogenetic TREE does not change the evolutionary distance among the various taxa represented.

Page 40: Bio info 5

51

Types of Trees

rooted treesCommonAncestor

Page 41: Bio info 5

52

Types of treesUnrooted tree represents the same phylogeny without the

root node

Page 42: Bio info 5
Page 43: Bio info 5

This Tree is Rooted ?

Page 44: Bio info 5

Fig. 4.6. Phylogenetic tree of dopamine receptor sequences.

Page 45: Bio info 5

Gene trees are not the same as species trees

Page 46: Bio info 5
Page 47: Bio info 5

Examples of what can be inferred from phylogenetic trees

(DNA, protein) 1. Which species are the closest living

relatives of modern humans?

2. Did the infamous Florida Dentist infect his patients with HIV?

3. What is the relation between HIV and SIV

Page 48: Bio info 5

Relatives of modern humans?

Mitochondrial DNA, most nuclear DNA-encoded genes, and DNA/DNA hybridization

The pre-molecular view

MYA

Chimpanzees

Orangutans Humans

Bonobos

GorillasHumans

Bonobos

Gorillas Orangutans

Chimpanzees

MYA015-30014

Page 49: Bio info 5
Page 50: Bio info 5

Based on lectures by C-B Stewart, and by Tal Pupko

2. Did the Florida Dentist infect his patients with HIV?

DENTIST

DENTIST

Patient D

Patient F

Patient C

Patient APatient G

Patient BPatient EPatient A

Local control 2Local control 3

Local control 9

Local control 35

Local control 3

Yes:The HIV sequences fromthese patients fall withinthe clade of HIV sequences found in the dentist.

No

No

From Ou et al. (1992) and Page & Holmes (1998)

Phylogenetic treeof HIV sequencesfrom the DENTIST,his Patients, & LocalHIV-infected People:

Page 51: Bio info 5

3. Relating Human HIV to Simian SIV retroviruses

human immunodeficiency virus 1 (HIV-1), pathogenic

SIVs are not pathogenic in their normal hosts

Page 52: Bio info 5
Page 53: Bio info 5

IMAGE FROM: Medical Art Service, Munich / Wellcome Images.

The structure of HIV

CD4 proteins on surface

Phospholipid membrane

Matrix

Viral RNA

Viral enzymes: - Reverse transcriptase - Integrase - Protease

Capsid

Page 54: Bio info 5

HIV attaches to CD4 receptors on T-Cell

Viral core of enzymes and RNA injected into cell

HIV’s replication cycle

DNA transcribed from viral RNA

Double-stranded DNA produced

DNA integrates with host chromosome

Viral RNA

Viral proteins

New virus assembled

Viral protease cuts up proteins

Transcription

New virus leaves cell

Viral integrase

Page 55: Bio info 5

Retrovirus genomes accumulate mutations relatively quickly • lacks an efficient proofreading, so make errors when it carries out RNA-dependent DNA synthesis.• the molecular clock runs rapidly in retroviruses,

Page 56: Bio info 5

•genomes that diverged quite recently display sufficient nucleotide dissimilarity for a phylogenetic analysis to be carried out.

•In less than 100 years, HIV and SIV genomes contain sufficient data.

Page 57: Bio info 5

• The starting point for this phylogenetic analysis is RNA extracted from virus particles.RT-PCR

Page 58: Bio info 5

RT-PCRReverse transcription polymerase chain reaction (RT-PCR) is a variant of polymerase chain reaction (PCR). It is a laboratory technique commonly used in molecular biology where a RNA strand is reverse transcribed into its DNA complement (complementary DNA, or cDNA) using the enzyme reverse transcriptase, and the resulting cDNA is amplified using PCR.

Page 59: Bio info 5
Page 60: Bio info 5

• This tree has a number of interesting features. First it shows that different samples ofHIV-1 have slightly different sequences, the samples as a whole forming a tight cluster, almost a star-like pattern, that radiates from one end of the unrooted tree.

Page 61: Bio info 5

•*This star-like topology implies

that the global AIDS epidemic began with a very small number of viruses, perhaps just one, which have spread and

diversified since entering the human population. • The closest relative to HIV-1 among primates is

the SIV of chimpanzees, the implication being that

• this virus jumped across the species barrier between chimps and humans and initiated the AIDS epidemic.

Page 62: Bio info 5

• However, this epidemic did not begin immediately: a relatively long uninterrupted branch links the center of the HIV-1 radiation with the internal node leading to the relevant SIV sequence, suggesting that after transmission to humans, HIV-1 underwent a latent period when it remained restricted to a small part of the global human population, presumably in Africa, before beginning its rapid spread to other parts of the world.

Page 63: Bio info 5

• Other primate SIVs are less closely related to HIV-1, but one, the SIV from sooty mangabey, clusters in the tree with the second human immunodeficiency virus, HIV-2.

• It appears that HIV-2 was transferred to the human population independently of HIV-1, and from a different simian host. HIV-2 is also able to cause AIDS, but has not, as yet, become globally epidemic.

Page 64: Bio info 5