multiple alignment using hidden markove models

Multiple alignment using hidden Multiple alignment using hidden Markove modelsMarkove models

November 21, 2001

Kim Hye Jin

Intelligent Multimedia Lab

[email protected]

mailto:[email protected]

Outline

• Introduction

• Methods and algorithm

• Result

• Discussion

IM lab

IntroductionIntroduction

• Why HMM?– Mathematically consistent description of

insertions and deletions– Theoretical insight into the difficulties of

combining disparate forms of information

Ex) sequences / 3D structures– Possible to train models from initially unaligned

sequences

Introduction| why HMM

IM lab

Methods and algorithms

• State transition – State sequence is a 1st

order Markov chain

– Each state is hidden

– match/Insert/delete state

• Symbol emission

Methods and algorithms|HMMs

States transition

Symbol emission

IM lab

Deletion state

Match state

Insertion state

IM lab




IM lab

• Replacing arbitrary scores with probabilities relative to consensus

• Model M consists of N states S1 …SN.

• Observe sequence O consists of T symbols

O1 … ON from an alphabet x• aij : a transition from Si to Sj • bj(x) : emission probabilities for emission of a

symbol x from each state Sj



IM lab

• Model of HMM : example of ACCY



IM lab

• Forward algorithm

- a sum rather than a maximum



IM lab

• Viterbi algorithm- the most likely path through the model- following the back pointers



IM lab

• Baum-Welch algorithm– A variation of the forward algorithm– Reasonable guess for initial model and then

calculates a score for each sequence in the training set using EM algorithms

• Local optima problem: – forward algorithm /Viterbi algorithm – Baum-welch algorithm



IM lab

• Simulated annealing– support global suboptimal – kT = 0 : standard Viterbi training procesure– kT goes down while in training



IM lab

ClustalW



IM lab

ClustalX

Results

Results

IM lab

• len : consensus length of the alignment

• ali : the # structurally aligned sequences

• %id: the percentage sequence identity

• Homo: the # homologues identified in and extraced from SwissProt 30

• %id : the average percentage sequence identity in the set of homologues

Results

Results

IM lab

Discussion

Discussion

IM lab

• HMM- a consistent theory for insertion and deletion

penality- EGF : fairly difficult alignments are well done

• ClusterW- progressive alignment- Disparaties between the sequence identity of the

structures and the sequence identity of the homologoues

- Large non-correlation between score and quality

Discussion

Discussion

IM lab

• The ability of HMM to sensitive fold recognition is apparent

multiple alignment using hidden markove models

Documents