instructions for exercises...will be specified in the field, i.e. for baboon it should read “papio...

7
Instructions for Exercises Exercise 1: Build phylogenetic trees You will build phylogenetic trees based on the amino acid sequences of the protein alpha-globin from several species and one taxon. First, you will download the sequences, then you will use the software “MEGA” to generate the phylogenetic trees. The species/taxon to use are: taxon: Petromyzontiformes (lampreys/Neunaugen) Papio anubis (baboon/Pavian) Bos taurus (cow/Kuh) Gallus gallus (chicken/Huhn) Dasyurus viverrinus (quoll/Beutelmarder) Rhincodon typus (whale shark/Walhai) Alligator sinensis (alligator/Krokodil) Xenopus laevis (frog/Krallenfrosch) Cyprinus carpio (carp/Karpfen) 1.1 Download sequences in “fasta” format Version 1 (takes less time): Copy/paste sequences Copy the sequences provided here: >NP_001162287.1 hemoglobin subunit alpha [Papio anubis] MVLSPDDKKHVKAAWGKVGEHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSDQVNKHGKKVADALTLAVGHVDDMPQA LSKLSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR >NP_001070890.2 hemoglobin subunit alpha [Bos taurus] MVLSAADKGNVKAAWGKVGGHAAEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGAKVAAALTKAVEHLDDLPGA LSELSDLHAHKLRVDPVNFKLLSHSLLVTLASHLPSDFTPAVHASLDKFLANVSTVLTSKYR >NP_001004376.1 hemoglobin subunit alpha-A [Gallus gallus] MVLSAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYFPHFDLSHGSAQIKGHGKKVVAALIEAANHIDDIAGT LSKLSDLHAHKLRVDPVNFKLLGQCFLVVVAIHHPAALTPEVHASLDKFLCAVGTVLTAKYR >XP_006020935.1 hemoglobin subunit alpha [Alligator sinensis] MVLSQEDKSNVKAIWGKASGHLEDYGAEVLERMFCAYPQTKIYFPHFDMSHGSPQIRAHGKKVFSALHEAVNHIDDLPGA LCRLSELHAHSLRVDPVNFKFLSHCVLVVFAIHHPCSLSPEVHASLDKFLCAVSAVLTSKYR >NP_001079746.1 hemoglobin subunit alpha-5 [Xenopus laevis] MTFSSAEKAAIASLWGKVSGHTDEIGAEALERLFLSYPQTKTYFSHFDLSHGSKDLRSHGGKVVKAIGNAATHIDDIPHA LSALSDLHAFKLKVDPGNFKLLSHAIQVTLAIHFPAEFNADAQAAWDKFLAVVSAVLVSKYR

Upload: others

Post on 04-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Instructions for Exercises...will be specified in the field, i.e. for baboon it should read “Papio anubis (taxid:9555)”. 3) You can add more taxa by clicking on the “+” field

Instructions for Exercises

Exercise 1: Build phylogenetic treesYou will build phylogenetic trees based on the amino acid sequences of the protein alpha-globin from several species and one taxon. First, you will download the sequences, then you will use the software “MEGA” to generate the phylogenetic trees. The species/taxon to use are:

taxon: Petromyzontiformes

(lampreys/Neunaugen)Papio anubis (baboon/Pavian) Bos taurus (cow/Kuh)

Gallus gallus (chicken/Huhn) Dasyurus viverrinus(quoll/Beutelmarder)

Rhincodon typus(whale shark/Walhai)

Alligator sinensis(alligator/Krokodil)

Xenopus laevis(frog/Krallenfrosch)

Cyprinus carpio (carp/Karpfen)

1.1 Download sequences in “fasta” format

Version 1 (takes less time): Copy/paste sequencesCopy the sequences provided here:

>NP_001162287.1 hemoglobin subunit alpha [Papio anubis]MVLSPDDKKHVKAAWGKVGEHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSDQVNKHGKKVADALTLAVGHVDDMPQALSKLSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR>NP_001070890.2 hemoglobin subunit alpha [Bos taurus]MVLSAADKGNVKAAWGKVGGHAAEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGAKVAAALTKAVEHLDDLPGALSELSDLHAHKLRVDPVNFKLLSHSLLVTLASHLPSDFTPAVHASLDKFLANVSTVLTSKYR>NP_001004376.1 hemoglobin subunit alpha-A [Gallus gallus]MVLSAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYFPHFDLSHGSAQIKGHGKKVVAALIEAANHIDDIAGTLSKLSDLHAHKLRVDPVNFKLLGQCFLVVVAIHHPAALTPEVHASLDKFLCAVGTVLTAKYR>XP_006020935.1 hemoglobin subunit alpha [Alligator sinensis]MVLSQEDKSNVKAIWGKASGHLEDYGAEVLERMFCAYPQTKIYFPHFDMSHGSPQIRAHGKKVFSALHEAVNHIDDLPGALCRLSELHAHSLRVDPVNFKFLSHCVLVVFAIHHPCSLSPEVHASLDKFLCAVSAVLTSKYR>NP_001079746.1 hemoglobin subunit alpha-5 [Xenopus laevis]MTFSSAEKAAIASLWGKVSGHTDEIGAEALERLFLSYPQTKTYFSHFDLSHGSKDLRSHGGKVVKAIGNAATHIDDIPHALSALSDLHAFKLKVDPGNFKLLSHAIQVTLAIHFPAEFNADAQAAWDKFLAVVSAVLVSKYR

Page 2: Instructions for Exercises...will be specified in the field, i.e. for baboon it should read “Papio anubis (taxid:9555)”. 3) You can add more taxa by clicking on the “+” field

>XP_018973483.1 PREDICTED: hemoglobin subunit alpha [Cyprinus carpio]MSLSDKDKAAVKALWAKISPKADDIGAEALGRMLTVYPQTKTYFAHWADLSPGSGPVKKHGKVIMGAVGDAVSKIDDLVGGLAALSELHAFKLRVDPANFKILAHNVIVVIGMLYPGDFPPEVHMSVDKFFQNLALALSEKYR>XP_020389343.1 hemoglobin subunit alpha-like [Rhincodon typus]MACELFSHEEKKIIHDIGKLLHENAKDYGAESLSRLFMIYPGSKTYFTYKDFSISNLKTHGEKVMNALAKAAKDVDNLKTSLCELAEFHGKKLLVDPQNFLVFSKCIQMTLANHLASFSSAHQLAVDKFLKAVCQYLSSRYR>prf||690951A hemoglobin [Lampetra fluviatilis (European river lamprey)]PIVDSGSPAVLSAAEKTKIRSAWAPVYSNYETSGVDILVKFFTSTPAAQEFFPKFKGMTSADELKKSADVRWHAERIINAVNDAVASMDDTEKMSMKDLSGKHAKSFQVDPQYFKVLAVIADTVAAGDAGFEKLSMCIILMLRSAY>P07419.2 RecName: Full=Hemoglobin subunit alpha; [Dasyurus viverrinus]MVLSDADKTHVKAIWGKVGGHAGAYAAEALARTFLSFPTTKTYFPHFDLSPGSAQIQGHGKKVADALSQAVAHLDDLPGTLSKLSDLHAHKLRVDPVNFKLLSHCLIVTLAAHLSKDLTPEVHASMDKFFASVATVLTSKYR

Version 2 (takes more time): Retrieve the sequences with BLASTp using human alpha-globin

1) Open https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome and enter the accession number “ABD95911.1” in the query field (this is the accessionnumber for the alpha-globin gene in human)

2) under “Choose Search Set” → ”Organism” enter the name of the species you want to run BLAST against.Attention: You need to click on the correct taxon in the dropdown menu that opens up, so that the the “taxid”will be specified in the field, i.e. for baboon it should read “Papio anubis (taxid:9555)”.

3) You can add more taxa by clicking on the “+” field next to the Organism feld. 4) Then, click on “BLAST”. Now, the human alpha-globin amino acid sequence will be compared to all other

amino acid sequences in the taxa you specified. 5) In the results, choose the hits which you think are most likely the alpha-globin proteins you were looking for,

and download the “fasta” file for it.

How to download “fasta” sequences from the BLAST output.

Page 3: Instructions for Exercises...will be specified in the field, i.e. for baboon it should read “Papio anubis (taxid:9555)”. 3) You can add more taxa by clicking on the “+” field

1.2 Generate a multi-fasta file

Generate a “multi-fasta” file in “MEGA X”. Start the software “MEGA X” (in windows, write “MEGA” in the searchwindow in the bottom left, and select “MEGA X”) and the “main window” of MEGA opens. Now follow the steps,depending on if you used version 1 or 2 to retrieve the sequence files in step 1.1:

If you chose version 1 in 1.1:1. Click on “File” → “Edit a Text File”, then on “File” → “New”2. In the new window that opens, you just paste the sequences you have copied.3. For each sequence, there should be one “header line” starting with a “>” sign followed by one or more lines

of the amino acid sequence.4. Edit the “header line” of each sequence into a short, meaningful name. For example, change

“>NP_001162287.1 hemoglobin subunit alpha [Papio anubis]” to “>Baboon”, or “>Pavian”. Just remember,that there still needs to be a “>” sign at the beginning. This is useful, since in the phylogenetic tree you willgenerate, there is not much space for long sequence names.

5. Save your multi-fasta file under a meaningful name, using “.fasta” as the filetype, i.e. under“hemoglobin.fasta”.

If you chose version 2 in 1.1:1. Click on File → Open a File/Session, then choose a *.fasta file you have downloaded from the BLAST output.2. Repeat step 1 for all *.fasta files you have downloaded.3. Copy/paste all sequences into a single, multi-fasta file.4. Follow steps 3-5 from above.

A valid “multi-fasta” file. An invalid “multi-fasta” file (no “>” in header line!).

1.3 Build phylogenetic trees in MEGA

Open the software “MEGA”. The “main window” of MEGA opens. Now follow these steps to first generate asequence alignment and then build phylogenetic trees based on the alignment:

1. Generate sequence alignment of all nine sequences:a. click on Align → Edit/Build alignment → Retrieve Sequences from a file → OK → choose your “multi-

fasta” file from step 1.2 → Openb. The “Alignment Explorer” window has opened (the “main window” is now in the background)c. in the “Alignment Explorer” window, on the top menu bar, click on Alignment → Align By Muscle →

Compute (we use default parameters).d. save the output file in MEGA format: Data → Export alignment → MEGA format.

2. Build phylogenetic trees (with changed parameters!):a. Please remember in which MEGA window you generate which tree, as in the final tree the name of

the phylogenetic method is not given!b. In the “main window” click on Phylogeny. Here you see five different methods. Use each of them, so

that at the end you will have five trees and set the parameters as follows: ■ “Test of phylogeny”=”Bootstrap method” ■ “number of bootstraps”=500 (usually researchers use 1000 bootstraps, but 500 is faster)■ “Substitutional Model” → Model/Method = ”JTT model” (not possible for MP).

Page 4: Instructions for Exercises...will be specified in the field, i.e. for baboon it should read “Papio anubis (taxid:9555)”. 3) You can add more taxa by clicking on the “+” field

c. You can run all the calculations in parallel, at the same time. Maximum likelihood is the most time consuming method, it may run for 40 min.

d. Save the resulting trees as PDF files by clicking on “Image” → “Save as…” (UPGMA.pdf,NJ.pdf, ML.pdf, …) → PDF in the top menu.

Exercise 2: BLAST

Michael Crichton's science fiction book about cloning dinosaurs, “The LostWorld” (sequel of Jurassic Park), contains a putative dinosaur DNA sequence.You can download this sequence from the Teaching website, or copy it from thisdocument. Use nucleotide-nucleotide BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) againstthe default nucleotide database (nr) to identify the real source of the DNAsequence.

>LostWorld DinoDNA from the book The Lost Worldgaattccggaagcgagcaagagataagtcctggcatcagatacagttggagataaggacggacgtgtggcagctcccgcagaggattcactggaagtgcattacctatcccatgggagccatggagttcgtggcgctgggggggccggatgcgggctcccccactccgttccctgatgaagccggagccttcctggggctgggggggggcgagaggacggaggcgggggggctgctggcctcctaccccccctcaggccgcgtgtccctggtgccgtgggcagacacgggtactttggggaccccccagtgggtgccgcccgccacccaaatggagcccccccactacctggagctgctgcaacccccccggggcagccccccccatccctcctccgggcccctactgccactcagcagcgggcccccaccctgcgaggcccgtgagtgcgtcatggccaggaagaactgcggagcgacggcaacgccgctgtggcgccgggacggcaccgggcattacctgtgcaactgggcctcagcctgcgggctctaccaccgcctcaacggccagaaccgcccgctcatccgccccaaaaagcgcctgcgggtgagtaagcgcgcaggcacagtgtgcagccacgagcgtgaaaactgccagacatccaccaccactctgtggcgtcgcagccccatgggggaccccgtctgcaacaacattcacgcctgcggcctctactacaaactgcaccaagtgaaccgccccctcacgatgcgcaaagacggaatccaaacccgaaaccgcaaagtttcctccaagggtaaaaagcggcgccccccgggggggggaaacccctccgccaccgcgggagggggcgctcctatggggggagggggggacccctctatgccccccccgccgccccccccggccgccgccccccctcaaagcgacgctctgtacgctctcggccccgtggtcctttcgggccattttctgccctttggaaactccggagggttttttggggggggggcggggggttacacggcccccccggggctgagcccgcagatttaaataataactctgacgtgggcaagtgggccttgctgagaagacagtgtaacataataatttgcacctcggcaattgcagagggtcgatctccactttggacacaacagggctactcggtaggaccagataagcactttgctccctggactgaaaaagaaaggatttatctgtttgcttcttgctgacaaatccctgtgaaaggtaaaagtcggacacagcaatcgattatttctcgcctgtgtgaaattactgtgaatattgtaaatatatatatatatatatatatatctgtatagaacagcctcggaggcggcatggacccagcgtagatcatgctggatttgtactgccggaattc

Page 5: Instructions for Exercises...will be specified in the field, i.e. for baboon it should read “Papio anubis (taxid:9555)”. 3) You can add more taxa by clicking on the “+” field

The calculation might take some time, so leave the browser windowopen and move on to the next exercise(s) and then come back to thisone.

Exercise 3: Substitution MatricesUse the BLOSUM62 and PAM250 matrices lying in front of you, to answer the questions on the question sheet. Please leave the printouts on the table when you leave, as they will be used by the following groups!

Exercise 4: DotplotOn your question sheet, plot the dotplot for the word ABRACADABRACADABRA against itself, using theseparameters: window size = 3, step = 1, threshold = 2. Put a dot at the middle position of each triplet that meets the threshold criteria.

Exercise 5: Dynamic ProgrammingThe following figure depicts a path for aligning two sequences. This is the highest scoring globalalignment of the two sequences, according to the Needleman Wunsch algorithm. Answer thequestions about it on the question sheet.

Page 6: Instructions for Exercises...will be specified in the field, i.e. for baboon it should read “Papio anubis (taxid:9555)”. 3) You can add more taxa by clicking on the “+” field

Have fun :)

If you finish early, you can do the Bonus Exercises!

Bonus Exercise 6: Use BLASTx to find a hidden messageMark Boguski of the NCBI supplied the author Michael Crichton with the “dinosaur” sequence (from the BLASTexercise). He embedded a hidden message in the sequence he provided. To see Mark's message use thetranslating BLAST (blastx) page with the provided sequence to translate the nucleotide sequence into an amino acidsequence. See his original publication here: http://markboguski.net/docs/publications/BioTechniques-1992.pdf

Hint: to see the message click on the top hit and focus on the gapped part of the alignment.

Bonus Exercise 7: Add your favourite animal to a treeChoose your favourite animal(s) and find alpha-globin sequences for them (use blastp for sequence retrieval, oranother way, if you know one). Add these sequences to your first “multi-fasta” file and build one tree (use anymethod) for the new, extended data. Interpret the results. The NCBI/Taxonomy database can help you to find thespecies you are interested in: https://www.ncbi.nlm.nih.gov/taxonomy/?term=.

Page 7: Instructions for Exercises...will be specified in the field, i.e. for baboon it should read “Papio anubis (taxid:9555)”. 3) You can add more taxa by clicking on the “+” field

Image sources: https://commons.wikimedia.org/wiki/File:Flussneunauge.jpghttps://upload.wikimedia.org/wikipedia/commons/8/81/Papio_phylogeny_%28ita%29.pnghttps://www.flickr.com/photos/usdagov/16802162424https://commons.wikimedia.org/wiki/File:Dasyurus_viverrinus_Gould.jpghttps://commons.wikimedia.org/wiki/File:Gallus_gallus_bankiva_1876.jpghttp://pngimg.com/download/13168https://commons.wikimedia.org/wiki/Category:Xenopus_laevis#/media/File:Xenopus_laevis_02.jpghttps://de.wikipedia.org/wiki/Datei:Cyprinus_carpio_GLERL_1.jpghttps://commons.wikimedia.org/wiki/File:Rhincodon_typus.png