![Page 1: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/1.jpg)
1
Introduction to
Bioinformatics
![Page 2: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/2.jpg)
2
Mini Exam 3
![Page 3: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/3.jpg)
3
Mini Exam
Take a pencil and a piece of paper
Please, not too close to your neighbour
There a three questions. You have in total 15 minutes for writing down short but clear answers
When you are ready please submit your answers to the desk in front
![Page 4: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/4.jpg)
4
Mini Exam 3
ANSWERS
![Page 5: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/5.jpg)
5
Introduction to Bioinformatics.
LECTURE 4: Hidden Markov Models
* Chapter 4: The boulevard of broken dreams
![Page 6: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/6.jpg)
6
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
4.1 The nose knows
* In 2004 Richard Axel and Linda Buck received the Nobel price for elucidating the olfactory system.
* Odorant Receptors (ORs): sense certain molecules outside the cell and signal inside the cell
* ORs contain 7 transmembrane domains
* OR is single largest gene family in human genome with 1000 genes – same as mice, rat, dog
* Most became pseudogenes – we lost smell due to vision
![Page 7: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/7.jpg)
7
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
4.2 Hidden Markov models
In 1989 Gary Churchill introduced the use of HMM for DNA-segmentation.
CENTRAL IDEAS:
* The string is generated by a system
* The system can be a number of distinct states
* The system can change between states with probability T
* In each state the system emits symbols to the string with
probability E
![Page 8: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/8.jpg)
8
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
4.2 Hidden Markov models
STATE 1 STATE 2
T(1,2)
STATE 3T(2,3)
A: pA
T: pT
C: pC
G: pG
TTCACTGTGAACGATCCGACCAGTACTACG
A: pA
T: pT
C: pC
G: pG
A: pA
T: pT
C: pC
G: pG
ACGTTGCCAAAGCGCTTAT
1111111111111111111111112222222222222333333333333333333333333
s =
h =
![Page 9: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/9.jpg)
9
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
HMM essentials
TRANSITION MATRIX = the probability of a state change:
EMISSION PROBABILITY = symbol probability distribution in a certain state
)|(),( 1 khlhPlkT li
)|(),( khbsPbkE ii
![Page 10: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/10.jpg)
10
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
HMM essentials
INITIAL PROBABILITY of a state :
sequence of the states visited: h
sequence of the generated symbols: s
)(),0( 1 khPkT
![Page 11: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/11.jpg)
11
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
HMM essentials
Probability of the hidden states h:
Probability of generated symbol string s given the hidden states h
),(),(),0()( 1211 nn hhThhThTP h
),(),(),()|( 2211 nn shEshEshEP hs
![Page 12: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/12.jpg)
12
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
HMM essentials
Joint probability of symbol string s and hidden states h:
)()|(),( hhshs PPP
![Page 13: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/13.jpg)
13
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
HMM essentials
Theorem of total probability :
Most likely sequence:
n
jn
j
jjj PPPPHH hh
hhshss )()|(),()(
),(maxarg* hshh
PnH
![Page 14: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/14.jpg)
14
EXAMPLE 4.2: Change points in Labda-phage
CG RICH
A: 0.2462C: 0.2476G: 0.2985T: 0.2077
0.9998
AT RICH
A: 0.2700C: 0.2084G: 0.1981T: 0.3236
0.9998
0.0002
0.0002
![Page 15: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/15.jpg)
15
EXAMPLE 4.2: Change points in Labda-phage
CG RICH
A: 0.2462C: 0.2476G: 0.2985T: 0.2077
0.9998
AT RICH
A: 0.2700C: 0.2084G: 0.1981T: 0.3236
0.9998
0.0002
0.0002
![Page 16: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/16.jpg)
16
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
4.3 Profile hidden Markov models
* Characterize sets of homologous genes and proteins based on common patterns in their sequence.
* Classis approach: multiple alignments of all elements in the family
* Position Specific Scoring Matrices (PSSM)
* Cannot handle variable lengths or gaps
* Profile HMM (pHHM) can do this
![Page 17: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/17.jpg)
17
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
4.3 Profile hidden Markov models
* See Figure 4.4 for a pHMM for a multiple alignment of:
VIVALASVEGASVIVADA-VI--SVIVADALL--AS
![Page 18: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/18.jpg)
18
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
4.3 Profile hidden Markov models
* Profile HMM (pHMM) allow to summarize the salient features of a protein alignment in one single model
* Also pHMM can be used to produce multiple alignments
![Page 19: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/19.jpg)
19
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
4.4 Finding genes with hidden Markov models
* HMMs are better in detecting genes than sequence alignment
* HMMs can detect introns and exons
* Downside: HMMs are computational much more demanding!
![Page 20: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/20.jpg)
20
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
4.5 Case study: odorant receptors
* The 7-transmembrane (7-TM) G-protein coupled receptors
![Page 21: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/21.jpg)
21
EXAMPLE 4.7: odorant receptors
IN
A: 15R: 11...V: 31
P(IN-IN)
OUT
A: 15R: 11...V: 31
P(OUT-OUT)
P(IN-OUT)
P(OUT-IN)
![Page 22: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/22.jpg)
22
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
4.6 Algorithms for HMM computations
Probability of the sequence under the given model is:
the most probable sequence is:
n
jn
j
jjj PPPPHH hh
hhshss )()|(),()(
),(maxarg* hshh
PnH
![Page 23: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/23.jpg)
23
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
The VITERBI Dynamic Programming algorithm
Given a sequence s of length n and an HMM with params (T,E):
1. Create table V of size |H|x(n+1);
2. Initialize i=0; V(0,0)=1; V(k,0)=0 for k>0;
3. For i=1:n, compute each entry using the recursive relation:
V(j,i) = E(j,s(i))*maxk {V(k,i-1)*T(k,j) }
pointer(i,j) = arg maxk {V(k,i-1)*T(k,j) }
4. OUTPUT: P(s,h*) = maxk {V(k,n)}
5. Trace-back: i=n:1, using: h*i-1 = pointer(i, h*i)
6. OUTPUT: h*(n) = maxk {V(k,n)}
![Page 24: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/24.jpg)
24
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
The FORWARD algorithm
Given a sequence s of length n and an HMM with params (T,E):
1. Create table F of size |H|x(n+1);
2. Initialize i=0; F(0,0)=1; V(k,0)=0 for k>0;
3. For i=1:n, compute each entry using the recursive relation:
F(j,i) = E(j,s(i))*∑k {F(k,i-1)*T(k,j) }
pointer(i,j) = arg maxk {V(k,i-1)*T(k,j) }
4. OUTPUT: P(s) = ∑k {F(k,n)}
![Page 25: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/25.jpg)
25
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
The EM (Expectation Maximization) algorithm
Given a sequence s and an HMM with unknown (T,E):
1. Initialize h, E and T;
2. Given s and h estimate E and T just by counting the symbols;
3. Given s, E and T estimate h e.g. with Viterbi-algorithm;
4. Repeat steps 2 and 3 until some criterion is met.
![Page 26: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/26.jpg)
26
EXAMPLE:
finding genes with VEIL
![Page 27: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/27.jpg)
27
EXAMPLE: finding genes with VEIL
• The Viterbi Exon-Intron Locator (VEIL) was developed by John Henderson, Steven Salzberg, and Ken Fasman at Johns Hopkins University.
• Gene finder with a modular structure:• Uses a HMM which is made up of sub-HMMs each to
describe a different bit of the sequence: upstream noncoding DNA, exon, intron, …
• Assumes test data starts and ends with noncoding DNA and contains exactly one gene.
• Uses biological knowledge to “hardwire” part of HMM, eg. start + stop codons, splice sites.
![Page 28: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/28.jpg)
28
The exon sub-model
![Page 29: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/29.jpg)
29
Other submodels• The start codon model is very simple:
• The splice junctions are also quite simple and can be hardwired (here is the 5’ splice site):
ExonUpstream a t g
![Page 30: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/30.jpg)
30
The overall model
Upstream Start codon Exon
Stop codon
Downstream
3’ splice site 5’ splice siteintron 5’ polyA site
For more details, see J. Henderson, S.L. Salzberg, and K. Fasman (1997) Journal of Computational Biology 4:2, 127-141.
![Page 31: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/31.jpg)
31
END of LECTURE 4
![Page 32: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/32.jpg)
32
Introduction to BioinformaticsLECTURE 4: HIDDEN MARKOV MODELS
![Page 33: 1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649d1f5503460f949f2b1c/html5/thumbnails/33.jpg)
33