1 biol2119 computational biology iterative search with michael cameron
Post on 22-Dec-2015
215 views
TRANSCRIPT
![Page 1: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/1.jpg)
1
BIOL2119 Computational Biology
Iterative searchwith Michael Cameron
![Page 2: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/2.jpg)
Overview• Multiple alignment• Profiles
– Position-Specific Score Matrices (PSSMs)– Hidden Markov Models (HMMs)
• Iterative search– PSI-BLAST– SAM
• Practical: building a simple bioinformatics search tool
![Page 3: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/3.jpg)
Single sequence
• Given a pair of sequences:
HS47_CHICK/23-396 DKNMENILLSPVVVASSLGLVSLGGKATTASQAKUNKNOWN ANPGQNVVLSAFSVLPPLGQLALASVGESHDELL
• We perform a pairwise alignment:
Query: 5 ENILLSPVVVASSLG---LVSLG 24 +N++LS V LG L S+GSbjct: 5 QNVVLSAFSVLPPLGQLALASVG 27
• Related sequences or chance similarity?
![Page 4: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/4.jpg)
Many sequences
• Given a collection of related sequences, we can construct a multiple alignment
HS47_CHICK/23-396 DKNMENILLSPVVVASSLGLVSLGGKATTASQAKSPI1_MYXVL/5-352 YNESDNVVFSPYGLTSALSVLRIAAGGNTKREIDPAI1_MOUSE/24-402 ASKDRNVVFSPYGVSSVLAMLQMTT--KTRRQIQPAI1_BOVIN/27-402 ASKDRNVVFSPYGVASVLAMLQLTTGGETRQQIQGDN_HUMAN/20-398 SRPHDNIVISPHGIASVLGMLQLGADGRTKKQLAPRTZ_HORVU/6-395 ERAAGNVAFSPLSLHVALSLITAGA-AATRDQLV
![Page 5: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/5.jpg)
Multiple alignments
• Multiple alignments can be generated
automatically and then refined by hand• Line-up the sequences by shifting start/end
locations and inserting gaps• A complex but well studied problem• Most popular tool for multiple alignment is
CLUSTALW (Higgins et al. 1994)
![Page 6: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/6.jpg)
A larger multiple alignment
![Page 7: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/7.jpg)
Increased sensitivity
• We can use a multiple alignment to detect homologies not previous possible with a single sequence
HS47_CHICK/23-396 DKNMENILLSPVVVASSLGLVSLGGKATTASQAKSPI1_MYXVL/5-352 YNESDNVVFSPYGLTSALSVLRIAAGGNTKREIDPAI1_MOUSE/24-402 ASKDRNVVFSPYGVSSVLAMLQMTT--KTRRQIQPAI1_BOVIN/27-402 ASKDRNVVFSPYGVASVLAMLQLTTGGETRQQIQGDN_HUMAN/20-398 SRPHDNIVISPHGIASVLGMLQLGADGRTKKQLAPRTZ_HORVU/6-395 ERAAGNVAFSPLSLHVALSLITAGA-AATRDQLV
UNKNOWN ANPGQNVVLSAFSVLPPLGQLALASVGESHDELL
![Page 8: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/8.jpg)
Multiple alignment search
• We can used a multiple alignment to
search a database• More sensitive than a single sequence• Computationally difficult• Common practice is to use profiles that
describe the multiple alignment instead
![Page 9: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/9.jpg)
Profiles
• A profile is construct from a multiple
alignment• It describes which residues are preferred
for each position/column in the alignment• Two most common types of profiles:
– Position-Specific Score Matrices (PSSMs)– Hidden Markov Models (HMMs)
![Page 10: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/10.jpg)
A R N D C Q E G H I L K M F P S T W Y V D 3 -2 -1 2 -2 -1 2 -2 -1 -3 -3 -1 -2 -2 -2 1 -1 -2 3 -2 K -1 4 4 -1 -3 0 -1 -2 -1 -4 -3 2 -2 -4 -2 2 -1 -4 -3 -3 N 1 -1 2 -1 -3 0 2 -2 -2 -3 -3 2 -3 -4 5 -1 -1 -4 -3 -3 M 1 -2 -1 3 -3 -1 -1 2 4 -2 -2 -2 3 -3 -2 1 -1 -3 -2 -2 E -2 3 0 4 -4 3 2 2 -1 -4 -4 0 -3 -4 -2 -1 -2 -4 -3 -4 N -2 -1 7 1 -4 -1 -1 -1 0 -4 -4 -1 -3 -4 -3 0 -1 -5 -3 -4 I -1 -4 -4 -4 -2 -3 -4 -4 -4 4 0 -3 0 -1 -3 -3 -1 -4 -2 5 L 1 -3 -4 -4 -2 -3 -3 -3 -4 1 2 -3 0 -2 -3 -2 -1 -3 -2 5 L -3 -3 -4 -4 -3 -4 -4 -4 -3 2 3 -4 0 6 -4 -3 -2 -1 1 0 S 0 -2 0 -1 -2 -1 -1 -1 -2 -3 -3 -1 -2 -3 -2 5 1 -4 -3 -2 P 1 -3 -3 -2 -3 -2 -2 -2 -3 -3 -3 -2 -3 -4 8 -1 -2 -4 -4 -3 V -2 -3 -3 -4 -3 -2 -3 -4 4 0 1 -3 -1 4 -4 -3 -2 0 5 1 V 0 -3 -1 -2 -2 -2 -2 4 -3 -2 -3 -2 -2 -3 -2 3 -1 -4 -3 1 V -1 -3 -4 -4 -2 -3 -4 -4 -4 3 3 -3 1 -1 -3 -3 -1 -3 -2 4 A 3 -2 -2 -2 -2 -1 -2 -2 4 -2 1 -2 -1 -2 -2 1 2 -3 -2 -1 S 0 -2 -1 -2 -2 -1 -1 -2 -2 -2 -2 -1 -2 -3 3 4 0 -4 -3 1 S 3 -3 -2 -3 -2 -2 -2 -2 -3 0 -1 -2 -1 -3 3 2 -1 -4 -3 3 L -2 -3 -4 -5 -2 -3 -4 -5 -4 1 5 -3 1 0 -4 -3 -2 -2 -2 0 G 2 -2 -1 -2 -2 -2 -2 5 -2 -4 -4 -2 -3 -4 -2 3 -1 -3 -3 -3 L -2 -2 -3 -3 -2 2 -2 -4 -2 0 3 -2 5 -1 -3 -2 -2 -3 -2 1 V -2 -3 -4 -4 -2 -3 -4 -5 -4 3 4 -3 1 -1 -4 -3 -2 -3 -2 2 S 1 2 -1 -2 -2 4 0 -2 -1 -3 -3 0 -2 -3 -2 2 2 -3 -2 -2 L -2 -2 -3 -3 -2 -2 -3 -3 -3 2 4 -2 3 0 -3 -2 -1 -2 -2 0 G 4 -2 -2 -2 -2 -2 -2 3 -3 -3 -3 -2 -2 -3 -2 0 2 -3 -3 -2 G 2 -2 -1 -2 -2 -2 -2 4 -2 -3 -3 -2 -2 -3 -2 2 2 -3 -3 -2 K 1 -1 -1 2 -3 -1 -1 3 -2 -2 -3 3 -2 -3 -2 -1 -1 -4 -3 1 A 3 -2 -1 -2 -2 -2 -2 5 -2 -3 -3 -2 -2 -3 -2 0 -1 -3 -3 -2 T 1 2 2 0 -2 0 3 -2 -1 -3 -2 0 -2 -3 -2 0 2 -3 -2 -2 T -1 -2 -1 -2 -2 -1 -2 -2 -2 -2 -2 -1 -2 -3 -2 2 6 -3 -2 -1 A 1 4 -1 -2 -3 0 -1 -2 4 -3 -3 3 -2 -3 -2 -1 -2 -4 -2 -3 S -2 3 0 4 -4 2 0 -2 -1 -4 -4 2 -3 -4 -2 1 -1 -4 -3 -3 Q -2 0 -1 0 -4 6 4 -3 0 -4 -3 0 -2 -4 -2 -1 -2 -3 -2 -3 A 1 -3 -4 -4 -2 -3 -3 -3 -4 4 3 -3 1 -1 -3 -2 -2 -3 -2 1 K 1 -1 -1 2 -3 3 0 -3 -2 -1 1 2 -1 -3 -2 -1 -1 -3 -2 1
PSSM:
![Page 11: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/11.jpg)
A R N D C Q E G H I L K M F P S T W Y V D 3 -2 -1 2 -2 -1 2 -2 -1 -3 -3 -1 -2 -2 -2 1 -1 -2 3 -2 K -1 4 4 -1 -3 0 -1 -2 -1 -4 -3 2 -2 -4 -2 2 -1 -4 -3 -3 N 1 -1 2 -1 -3 0 2 -2 -2 -3 -3 2 -3 -4 5 -1 -1 -4 -3 -3 M 1 -2 -1 3 -3 -1 -1 2 4 -2 -2 -2 3 -3 -2 1 -1 -3 -2 -2 E -2 3 0 4 -4 3 2 2 -1 -4 -4 0 -3 -4 -2 -1 -2 -4 -3 -4 N -2 -1 7 1 -4 -1 -1 -1 0 -4 -4 -1 -3 -4 -3 0 -1 -5 -3 -4 I -1 -4 -4 -4 -2 -3 -4 -4 -4 4 0 -3 0 -1 -3 -3 -1 -4 -2 5 L 1 -3 -4 -4 -2 -3 -3 -3 -4 1 2 -3 0 -2 -3 -2 -1 -3 -2 5 L -3 -3 -4 -4 -3 -4 -4 -4 -3 2 3 -4 0 6 -4 -3 -2 -1 1 0 S 0 -2 0 -1 -2 -1 -1 -1 -2 -3 -3 -1 -2 -3 -2 5 1 -4 -3 -2 P 1 -3 -3 -2 -3 -2 -2 -2 -3 -3 -3 -2 -3 -4 8 -1 -2 -4 -4 -3 V -2 -3 -3 -4 -3 -2 -3 -4 4 0 1 -3 -1 4 -4 -3 -2 0 5 1 V 0 -3 -1 -2 -2 -2 -2 4 -3 -2 -3 -2 -2 -3 -2 3 -1 -4 -3 1 V -1 -3 -4 -4 -2 -3 -4 -4 -4 3 3 -3 1 -1 -3 -3 -1 -3 -2 4 A 3 -2 -2 -2 -2 -1 -2 -2 4 -2 1 -2 -1 -2 -2 1 2 -3 -2 -1 S 0 -2 -1 -2 -2 -1 -1 -2 -2 -2 -2 -1 -2 -3 3 4 0 -4 -3 1 S 3 -3 -2 -3 -2 -2 -2 -2 -3 0 -1 -2 -1 -3 3 2 -1 -4 -3 3 L -2 -3 -4 -5 -2 -3 -4 -5 -4 1 5 -3 1 0 -4 -3 -2 -2 -2 0 G 2 -2 -1 -2 -2 -2 -2 5 -2 -4 -4 -2 -3 -4 -2 3 -1 -3 -3 -3 L -2 -2 -3 -3 -2 2 -2 -4 -2 0 3 -2 5 -1 -3 -2 -2 -3 -2 1
![Page 12: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/12.jpg)
Pros and cons of PSSMs
• Fast and simple to use• Little modification to the BLAST
algorithm required to use them• PSSM replaces the query sequence and
scoring matrix• Statistical theory for scoring not as solid• Not as detailed as HMMs
![Page 13: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/13.jpg)
Hidden Markov Models
1 2
0.1
0.2
0.8
0.9
A 0.1
G 0.3
T 0.5
C 0.1
A 0.7
G 0.1
T 0.1
C 0.1State sequence (hidden):
1 1 1 1 1 1 1 2 2 2 2 2 2 1 1
Symbol sequence:
T G T T C G T A A A C A A T G
![Page 14: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/14.jpg)
Profile HMMs
M1
D1
I1
M2
D2
I2
M3
D3
I3
M4
D4
A 0.01
C 0.04
D 0.31
…
A 0.03
C 0.15
D 0.02
…
A 0.22
C 0.02
D 0.13
…
A 0.05
C 0.03
D 0.09
…
![Page 15: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/15.jpg)
Pros and cons of HMMs
• Strong statistical foundation• Can be trained on aligned and non-
aligned data• More detailed• Alignment is more computational
expensive
![Page 16: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/16.jpg)
Using profiles
• There exist many databases containing profiles
of known families:• ie. Pfam database
• Tools exist for searching a profile database with
a sequence query and vice versa:• HMMER (Durbin et al. 1998)• IMPALA (Schaffer et al. 1999)
• Profiles form the basis of iterative search
![Page 17: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/17.jpg)
Iterative search
• Search database with query sequence• Construct multiple alignment from high-scoring
aligned sequences• Construct a profile using the multiple alignment• Search database with profile. Repeat.
• Popular tools for iterative search:• PSI-BLAST (Altschul et al. 1997) uses PSSMs• SAM (Karplus et al. 1998) uses HMMs
![Page 18: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/18.jpg)
Iterative process flow chartQuery sequence
SEARCH Database
ResultsMultiple alignment
PSSM
Converged?No Yes End
![Page 19: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/19.jpg)
PSI-BLAST
• Iterative version of BLAST that uses PSSMs• Each iteration takes about as long as a regular BLAST
search. • Maximum number of iterations
• Typically between 5 and 20
• Threshold for inclusion in multiple alignment and PSSM construction• Typically E-value of 0.001 or less
• A PSI-BLAST search takes considerably longer than BLAST but is much more sensitive
![Page 20: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/20.jpg)
SAM
• Iterative tool that uses Hidden Markov Models
instead of Position-Specific Score Matrices• Only uses 4 iterations• Also uses BLAST for searching
• Refines alignment scoring using a HMM
• Better accuracy than BLAST• About 3 times slower than BLAST
![Page 21: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/21.jpg)
Profile corruption
• False positives are high-scoring alignments that are not in fact related to the query
• A single false positive can corrupt the profile• The profile now includes information about an
alignment with an unrelated sequence• Decreasing the e-value threshold reduces the
likelihood of false positives, but decreases sensitivity• Selectivity / sensitivity tradeoff
![Page 22: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/22.jpg)
Searching....doneResults from round 1 Score ESequences producing significant alignments: (bits) Value
gi|33240976|ref|NP_875918.1| Kef-type K+ transport system predic... 82 4e-17gi|16264095|ref|NP_436887.1| putative ionic voltage-gated channe... 33 0.020gi|17536613|ref|NP_494333.1| TWiK family of potassium channels (... 33 0.022gi|12232625|emb|CAC21575.2| MHC class I antigen [Homo sapiens] 28 0.90 gi|32420293|ref|XP_330590.1| hypothetical protein [Neurospora cr... 27 1.5 gi|482884|gb|AAC46500.1| circumsporozoite protein 27 1.6 gi|6723566|emb|CAB66363.1| immunoglobulin mu heavy chain variabl... 27 1.9 gi|29836862|emb|CAD88668.1| immunoglobulin heavy chain [Homo sap... 27 1.9 gi|2127462|pir||S72598 sulfate permease T protein - Mycobacteriu... 26 2.5 gi|231413|sp|P30490|1B52_HUMAN HLA class I histocompatibility an... 26 2.6 gi|231350|sp|P30377|1A03_GORGO CLASS I HISTOCOMPATIBILITY ANTIGE... 26 3.1 gi|3522980|dbj|BAA32614.1| MHC class I antigen [Homo sapiens] 26 3.3 gi|32420913|ref|XP_330900.1| hypothetical protein [Neurospora cr... 25 7.2 gi|21356701|ref|NP_652739.1| Pp1-Y2 [Drosophila melanogaster] >g... 25 7.5
>gi|33240976|ref|NP_875918.1| Kef-type K+ transport system predicted NAD-binding component [Prochlorococcus marinus subsp.
Score = 82.0 bits (201), Expect = 4e-17 Identities = 52/194 (26%), Positives = 96/194 (49%), Gaps = 29/194 (14%)
Query: 256 FTFEFLMRVVFCPNKVEFIK----------NSLNIIDFVAILPFYLEVGLSGLSSKAAKD 305 F E+L R+ P + ++ K + + IID +AI+P ++ V + Sbjct: 66 FCIEYLCRLWVAPLQEKYGKGLKGIFRYVLSPMAIIDVIAIIPSFIGV----------RA 115
Query: 306 VLGFLRVVRFVRILRIFKLTRHFVGLRVLGHTLRASTNEFLLLIIFLALGVLIFATMIYY 365 L LRV+R +RIL+I + + + + LR+ + E + ++ L +LI +T++Y Sbjct: 116 ELKILRVIRLLRILKIGRSEKFKKSIFHFNYALRSKSQELQISTVYTVLLLLISSTLMYL 175
![Page 23: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/23.jpg)
Searching....doneResults from round 2 Score ESequences producing significant alignments: (bits) ValueSequences used in model and found again:
gi|33240976|ref|NP_875918.1| Kef-type K+ transport system predic... 273 7e-75
Sequences not found previously or not previously below threshold:
gi|16264095|ref|NP_436887.1| putative ionic voltage-gated channe... 50 2e-07gi|17536613|ref|NP_494333.1| TWiK family of potassium channels (... 39 5e-04gi|12232625|emb|CAC21575.2| MHC class I antigen [Homo sapiens] 28 0.64 gi|15922436|ref|NP_378105.1| 266aa long conserved hypothetical p... 28 0.83 gi|482884|gb|AAC46500.1| circumsporozoite protein 28 1.1 gi|32420293|ref|XP_330590.1| hypothetical protein [Neurospora cr... 28 1.2 gi|6723566|emb|CAB66363.1| immunoglobulin mu heavy chain variabl... 27 1.7 gi|29836862|emb|CAD88668.1| immunoglobulin heavy chain [Homo sap... 27 1.8 gi|231413|sp|P30490|1B52_HUMAN HLA class I histocompatibility an... 27 1.9 gi|3522980|dbj|BAA32614.1| MHC class I antigen [Homo sapiens] 26 2.4 gi|2127462|pir||S72598 sulfate permease T protein - Mycobacteriu... 26 2.5 gi|231350|sp|P30377|1A03_GORGO CLASS I HISTOCOMPATIBILITY ANTIGE... 26 2.6 gi|15216247|dbj|BAB63254.1| PER3 [Homo sapiens] 25 4.7 gi|32420913|ref|XP_330900.1| hypothetical protein [Neurospora cr... 25 5.1 gi|21356701|ref|NP_652739.1| Pp1-Y2 [Drosophila melanogaster] >g... 25 7.7 gi|32417378|ref|XP_329167.1| predicted protein [Neurospora crass... 25 7.8
>gi|33240976|ref|NP_875918.1| Kef-type K+ transport system predicted NAD-binding component [Prochlorococcus marinus subsp. Score = 273 bits (699), Expect = 7e-75 Identities = 52/194 (26%), Positives = 96/194 (49%), Gaps = 29/194 (14%)
Query: 256 FTFEFLMRVVFCPNKVEFIK----------NSLNIIDFVAILPFYLEVGLSGLSSKAAKD 305 F E+L R+ P + ++ K + + IID +AI+P ++ V + Sbjct: 66 FCIEYLCRLWVAPLQEKYGKGLKGIFRYVLSPMAIIDVIAIIPSFIGV----------RA 115
Query: 306 VLGFLRVVRFVRILRIFKLTRHFVGLRVLGHTLRASTNEFLLLIIFLALGVLIFATMIYY 365 L LRV+R +RIL+I + + + + LR+ + E + ++ L +LI +T++Y Sbjct: 116 ELKILRVIRLLRILKIGRSEKFKKSIFHFNYALRSKSQELQISTVYTVLLLLISSTLMYL 175
Query: 366 AERIGAQPNDPSASEHTHFKNIPIGFWWAVVTMTTLGYGDMYPQTWSGMLVGALCALAGV 425 AE S+ + +IP WW+V T++ +GYGD P T G ++ ++ +L G+Sbjct: 176 AE---------SSIQPELLGSIPRCLWWSVTTVSAVGYGDSIPVTAIGKIIASVTSLLGI 226
Query: 426 LTIAMPVPVIVNNF 439 IA+P ++ FSbjct: 227 GAIAIPTGILAAGF 240
![Page 24: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/24.jpg)
Searching....doneResults from round 3 Score ESequences producing significant alignments: (bits) ValueSequences used in model and found again:
gi|33240976|ref|NP_875918.1| Kef-type K+ transport system predic... 251 3e-68gi|16264095|ref|NP_436887.1| putative ionic voltage-gated channe... 235 3e-63gi|17536613|ref|NP_494333.1| TWiK family of potassium channels (... 83 2e-17
Sequences not found previously or not previously below threshold:
gi|482884|gb|AAC46500.1| circumsporozoite protein 28 1.1 gi|32420293|ref|XP_330590.1| hypothetical protein [Neurospora cr... 28 1.1 gi|6723566|emb|CAB66363.1| immunoglobulin mu heavy chain variabl... 27 1.6 gi|29836862|emb|CAD88668.1| immunoglobulin heavy chain [Homo sap... 27 1.7 gi|2127462|pir||S72598 sulfate permease T protein - Mycobacteriu... 26 2.5 gi|12232625|emb|CAC21575.2| MHC class I antigen [Homo sapiens] 26 4.0 gi|15805398|ref|NP_294092.1| hypothetical protein [Deinococcus r... 26 4.3 gi|15216247|dbj|BAB63254.1| PER3 [Homo sapiens] 25 4.7 gi|32420913|ref|XP_330900.1| hypothetical protein [Neurospora cr... 25 5.1 gi|21356701|ref|NP_652739.1| Pp1-Y2 [Drosophila melanogaster] >g... 25 7.6 gi|15889580|ref|NP_355261.1| AGR_C_4191p [Agrobacterium tumefaci... 25 8.4 gi|15598989|ref|NP_252483.1| hypothetical protein [Pseudomonas a... 25 9.8
CONVERGED!>gi|33240976|ref|NP_875918.1| Kef-type K+ transport system predicted NAD-binding component [Prochlorococcus marinus subsp. Score = 251 bits (642), Expect = 3e-68 Identities = 53/205 (25%), Positives = 101/205 (49%), Gaps = 29/205 (14%)
Query: 245 LTYIEGVCVVWFTFEFLMRVVFCPNKVEFIK----------NSLNIIDFVAILPFYLEVG 294 + +++ V F E+L R+ P + ++ K + + IID +AI+P ++ V Sbjct: 55 IDFLDWVIGGLFCIEYLCRLWVAPLQEKYGKGLKGIFRYVLSPMAIIDVIAIIPSFIGV- 113
![Page 25: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/25.jpg)
Summary• Multiple alignment
– Advantages over using a single sequence
• Constructing and using profiles– Position-Specific Score Matrices (PSSMs)– Hidden Markov Models (HMMs)
• Iterative search– PSI-BLAST– SAM
![Page 26: 1 BIOL2119 Computational Biology Iterative search with Michael Cameron](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a61536/html5/thumbnails/26.jpg)
Practical exercise• Build a simple bioinformatics search
tool:– Perform Smith-Waterman search between
query and each database sequence– Calculate e-value for each alignment, and if
below cutoff then display alignment
• For code and detailed instructions go to:
http://www.cs.rmit.edu.au/~mcam/prac