expect value expect value (e-value) expected number of hits, of equivalent or better score, found by...

26

Post on 20-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Expect value(E-value)

• Expected number of hits, of equivalent or better score, found by random chance in a database of the size searched.

Conserved domains

Domain: sequence of amino acids that typically fold to a stable tertiary structure. Many proteins are multi-domain.

Blast to Psi-Blast

• Blast makes use of Scoring Matrix derived from large number of proteins.

• What if you want to find homologs based upon a specific gene product?

• Develop a position specific scoring matrix (PSSM).

PSSM

M

G

A

S

F

M F W Y G A P V I L C R K E N D Q S T H

5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0

1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 0

0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Determine frequency of substitution, and converts to LogOdd score.

PSSM

M

G

A

S

F

M F W Y G A P V I L C R K E N D Q S T H

5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0

1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 0

0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Can include a score for permitting insertions and deletions. Perhaps this position is at a turn, where INDELs are common.

INDEL

Indel 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

PSSM

• In evaluating (scoring) alignments, PSSM approaches typically:– Reward matches to columns that have

conserved amino acids– Penalize mismatches to columns with

conserved amino acid more than mismatches in a variable column

PSI-BLAST

• Input a single query sequence.

• Executes a BLAST run.

• Program takes significant hits, incorporates matches into a PSSM.

• Sequences >98% similar not included (avoid biasing the PSSM).

Power of approach:

• PSI-BLAST is iterative.

• Takes best hits and improves the scoring matrix.

Original Blast had 84 hits.

The PSSM will skewtowards this region