bioinformatics tutorial

5
Wei Zhen Enzymology Part A: 1) See attached 2) Shown below Part B 3. To what superfamily does this protein belong? Ribokinase_pfkb_like superfamily 4. How many significant (e-value < 0.001) matches are there for our protein? 10 5. List the names of the organisms of the top three significant matches. Clostridium Difficile (seq1) Salmoella Typhimurium (seq2) Bacillus Subtillis (seq3) Part C – Alignment Input: >CB_ThiD MKKVLTIAGSDSSGGAGIQADIKTMSSLGVYAMSIITAVTAQNSIGVQDVHEVPKNMVEA QIKSVFDDIDVDAVKIGMLSNSETIKSIKEYLEKYKAKNIVLDPVMVSKSGYFLLKPEAI EELKKLISITNIVTPNIPEAEVLSKIEINSEDDMKKAAIIIQAIGVKNVLVKGGHRCSDA NDILLYEDKFITIPGNRIETKNTHGTGCTLSSAIASYLTKGFNIEKSVSLSKEYITKAIE NSFPIGEGVGPVGHFIELYKKAHLDF >seq1 MQRINALTIAGTDPSGGAGIQADLKTFSALGAYGCSVITALVAENTCGVQSVYRIEPDFV AAQLDSVFSDVRIDTTKIGMLAETDIVEAVAERLQRHHVRNVVLDTVMLAKSGDPLLSPS AIETLRVRLLPQVSLITPNLPEAAALLDAPHARTEQEMLAQGRALLAMGCEAVLMKGGHL EDAQSPDWLFTREGEQRFSAPRVNTKNTHGTGCTLSAALAALRPRHRSWGETVNEAKAWL SAALAQADTLEVGKGIGPVHHFHAWW >seq2

Upload: wei-feng-zhen

Post on 31-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Guide on how to use blast to find homologs between different protiens

TRANSCRIPT

Page 1: Bioinformatics tutorial

Wei ZhenEnzymology

Part A: 1) See attached2) Shown below

Part B3. To what superfamily does this protein belong? Ribokinase_pfkb_like superfamily

4. How many significant (e-value < 0.001) matches are there for our protein?10

5. List the names of the organisms of the top three significant matches.Clostridium Difficile (seq1)Salmoella Typhimurium (seq2)Bacillus Subtillis (seq3)

Part C – Alignment

Input:>CB_ThiDMKKVLTIAGSDSSGGAGIQADIKTMSSLGVYAMSIITAVTAQNSIGVQDVHEVPKNMVEAQIKSVFDDIDVDAVKIGMLSNSETIKSIKEYLEKYKAKNIVLDPVMVSKSGYFLLKPEAIEELKKLISITNIVTPNIPEAEVLSKIEINSEDDMKKAAIIIQAIGVKNVLVKGGHRCSDANDILLYEDKFITIPGNRIETKNTHGTGCTLSSAIASYLTKGFNIEKSVSLSKEYITKAIENSFPIGEGVGPVGHFIELYKKAHLDF

>seq1MQRINALTIAGTDPSGGAGIQADLKTFSALGAYGCSVITALVAENTCGVQSVYRIEPDFVAAQLDSVFSDVRIDTTKIGMLAETDIVEAVAERLQRHHVRNVVLDTVMLAKSGDPLLSPSAIETLRVRLLPQVSLITPNLPEAAALLDAPHARTEQEMLAQGRALLAMGCEAVLMKGGHLEDAQSPDWLFTREGEQRFSAPRVNTKNTHGTGCTLSAALAALRPRHRSWGETVNEAKAWLSAALAQADTLEVGKGIGPVHHFHAWW

>seq2MQRINALTIAGTDPSGGAGIQADLKTFSALGAYGCSVITALVAENTCGVQSVYRIEPDFVAAQLDSVFSDVRIDTTKIGMLAETDIVEAVAERLQRHHVRNVVLDTVMLAKSGDPLLSPSAIETLRVRLLPQVSLITPNLPEAAALLDAPHARTEQEMLAQGRALLAMGCEAVLMKGGHLEDAQSPDWLFTREGEQRFSAPRVNTKNTHGTGCTLSAALAALRPRHRSWGETVNEAKAWLSAALAQADTLEVGKGIGPVHHFHAWW

>Seq3MSMHKALTIAGSDSSGGAGIQADLKTFQEKNVYGMTALTVIVAMDPNNSWNHQVFPIDTDTIRAQLATITDGIGVDAMKTGMLPTVDIIELAAKTIKEKQLKNVVIDPVMVCKGANEVLYPEHAQALREQLAPLATVITPNLFEASQLSGMDELKTVDDMIEAAKKIHALGAQYVVITGG

Page 2: Bioinformatics tutorial

GKLKHEKAVDVLYDGETAEVLESEMIDTPYTHGAGCTFSAAVTAELAKGAEVKEAIYAAKEFITAAIKESFPLNQYVGPTKHSALRLNQQS

ALLIGNMENT:

CLUSTAL O(1.2.1) multiple sequence alignment CB_ThiD --MKKVLTIAGSDSSGGAGIQADIKTMSSLGVYAMSIITAVTAQNS--IGVQDVHEVPKN seq1 MQRINALTIAGTDPSGGAGIQADLKTFSALGAYGCSVITALVAENT--CGVQSVYRIEPD seq2 MQRINALTIAGTDPSGGAGIQADLKTFSALGAYGCSVITALVAENT—CGVQSVYRIEPDSeq3 MSMHKALTIAGSDSSGGAGIQADLKTFQEKNVYGMTALTVIVAMDPNNSWNHQVFPIDTD :.*****:* *********:**:. .*. : :*.:.* : :.*. : :

CB_ThiD MVEAQIKSVFDDIDVDAVKIGMLSNSETIKSIKEYLEKYKAKNIVLDPVMVSKSGYFLLK seq1 FVAAQLDSVFSDVRIDTTKIGMLAETDIVEAVAERLQRHHVRNVVLDTVMLAKSGDPLLS seq2 FVAAQLDSVFSDVRIDTTKIGMLAETDIVEAVAERLQRHHVRNVVLDTVMLAKSGDPLLSSeq3 TIRAQLATITDGIGVDAMKTGMLPTVDIIELAAKTIKEKQLKNVVIDPVMVCKGANEVLY : **: :: . : :*: * *** : :: : ::. : :*:*:* **:.*.. :*

CB_ThiD PEAIEELKK-LISITNIVTPNIPEAEVLSKI-EINSEDDMKKAAIIIQAIGVKNVLVKGG seq1 PSAIETLRVRLLPQVSLITPNLPEAAALLDAPHARTEQEMLAQGRALLAMGCEAVLMKGG seq2 PSAIETLRVRLLPQVSLITPNLPEAAALLDAPHARTEQEMLAQGRALLAMGCEAVLMKGGSeq3 PEHAQALREQLAPLATVITPNLFEASQLSGMDELKTVDDMIEAAKKIHALGAQYVVITGG *. : *: * ..::***: ** * . .: ::* . : *:* : *::.**

CB_ThiD H-RCSDAN-DILLYEDKFITIPGNRIETKNTHGTGCTLSSAIASYLTKGFNIEKSVSLSK seq1 H-LEDAQSPDWLFTREGEQRFSAPRVNTKNTHGTGCTLSAALAALRPRHRSWGETVNEAK seq2 H-LEDAQSPDWLFTREGEQRFSAPRVNTKNTHGTGCTLSAALAALRPRHRSWGETVNEAKSeq3 GKLKHEKAVDVLYDGETAEVLESEMIDTPYTHGAGCTFSAAVTAELAKGAEVKEAIYAAK * * : : . ::* ***:***:*:*::: : . ::: :*

CB_ThiD EYITKAI--ENSFPIGEGVGPVGHFIELYKKAHLDF seq1 AWLSAALAQADTLEVGKGIGPVHHFHAWW------- seq2 AWLSAALAQADTLEVGKGIGPVHHFHAWW-------Seq3 EFITAAI--KESFPLNQYVGPTKHSALRLNQQS--- ::: *: ::: : : :**. *

6. Based on the alignment scores, which sequence is the most similar to CbThiD (give the PDB code)?

Seq1 has the most similarity compared to CbThiD (PDB code = 4JJP_A

7. Print out the alignment (use “view alignment file” and print this page) and on that page highlight the longest continuous stretch of completely conserved (identical) amino acids. There is a 19 residue stretch that possesses approximately 70% identity. Put a box around this region.

Part D. Secondary Structure Prediction

8. Highlight all helical regions of the amino acid sequence (as predicted by Jpred) on the amino acid sequence that you printed out for question two (hint: use the “simple HTML” viewing option).

Page 3: Bioinformatics tutorial

Red = HelicesGreen = Sheets

MKKVLTIAGSDSSGGAGIQADIKTMSSLGVYAMSIITAVTAQNSIGVQDVHEVPKNMVEAQIKSVFDDIDVDAVKIGMLSNSETIKSIKEYLEKYKAKNIVLDPVMVSKSGYFLLKPEAIEELKKLISITNIVTPNIPEAEVLSKIEINSEDDMKKAAIIIQAIGVKNVLVKGGHRCSDANDILLYEDKFITIPGNRIETKNTHGTGCTLSSAIASYLTKGFNIEKSVSLSKEYITKAIENSFPIGEGVGPVGHFIELYKKAHLDF

Part E: Homology modeling

9. Open the structure with Chimera (see directions on our Blackboard site for a refresher on how to use Chimera) and examine the overall fold. Show your structure as cartoon, color by secondary structure, then print out an image of the 3D structure (use File, save image as…, select TIFF, and then insert the .TIFF picture into Word). You should probably set the background color to white to make printing easier.

10. What structure (provide PDB code) was used as a template for the homology model? 2i5b

11. Download the PDB file for the structure you identified in question 10 from the protein data bank (from the PDB website, type in the PDB code and when the entry comes up, go to the right side where “download files” is displayed and click there to select “PDB file (text) or fetch in the main Chimera window.” Using Chimera, print out a similar figure as in question 9 for this protein.

Page 4: Bioinformatics tutorial

(pentemer)

12. You can align two structures in Chimera by going to “Tools” " “Structure comparison” " “Match Maker.” Are there any major structural differences between the query (CbThiD) and the template?