expect value expect value (e-value) expected number of hits, of equivalent or better score, found by...
Post on 20-Dec-2015
221 views
TRANSCRIPT
Expect value(E-value)
• Expected number of hits, of equivalent or better score, found by random chance in a database of the size searched.
Conserved domains
Domain: sequence of amino acids that typically fold to a stable tertiary structure. Many proteins are multi-domain.
Blast to Psi-Blast
• Blast makes use of Scoring Matrix derived from large number of proteins.
• What if you want to find homologs based upon a specific gene product?
• Develop a position specific scoring matrix (PSSM).
PSSM
M
G
A
S
F
M F W Y G A P V I L C R K E N D Q S T H
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0
1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 0
0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Determine frequency of substitution, and converts to LogOdd score.
PSSM
M
G
A
S
F
M F W Y G A P V I L C R K E N D Q S T H
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0
1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 0
0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Can include a score for permitting insertions and deletions. Perhaps this position is at a turn, where INDELs are common.
INDEL
Indel 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
PSSM
• In evaluating (scoring) alignments, PSSM approaches typically:– Reward matches to columns that have
conserved amino acids– Penalize mismatches to columns with
conserved amino acid more than mismatches in a variable column
PSI-BLAST
• Input a single query sequence.
• Executes a BLAST run.
• Program takes significant hits, incorporates matches into a PSSM.
• Sequences >98% similar not included (avoid biasing the PSSM).