infinite hierarchical hidden markov models

Infinite Hierarchical Hidden Markov Models

Katherine A. Heller, Yee Whye Teh and Dilan Görür

Lu Ren ECE@Duke University

Nov 23, 2009

AISTATS 2009

Outline

• Hierarchical structure learning for sequential data

• Hierarchical hidden Markov model (HHMM)

• Infinite hierarchical hidden Markov model (IHHMM)

• Inference and learning

• Experiment results and demonstrations

• Related work and extensions

Multi-scale Structure

• Consider to infer correlated observations over long periods in the observation sequence.• Potential application: language multi-resolution structure learning, video structure discovery, activity detection etc.

Sequential data generated The sampled “states” used to generate the data

Hierarchical HMM (HHMM)

1. Hierarchical Hidden Markov Models (HHMM) Multiscale models of sequences where each level of the model

is a separate HMM emitting lower level HMMs in a recursive manner.

The generative process of one HHMM example [2]

2. The entire set of parametersWith a fixed model structure, the model is characterized by the following parameters [1]

with

with

with

3. Representing the HHMM as a DBN [2]• Simply assume all production states are at the bottom and the state of HMM at level and time is represented by .• specifies the complete “path” from the root to the leaf state.• Indicator variable control completion of the HHMM at level and time .


An HHMM represented as a DBM [2]


Infinite Hierarchical HMM (IHHMM)

IHHMM: allows the HHMM hierarchy to have a potentially infinite number of levels.

Observation: State:Also a state transition indicating variable is introduced:• indicate whether there is a completion of the HHMM at level right before time ;• indicate presence of a state transition from to• The conditional probability of is:

• There is an opportunity to transition at level only if there was a transition at level .

Infinite Hierarchical HMM (IHHMM)The property implied by the structure:1. The number of transitions at level before a transition at

level occurs is geometrically distributed with a mean .

This implies that the expected number of time steps for which a

state at level persists in its current value is .

The states at higher levels persist longer.

2. The first non-transitioning level at time , has the distribution

is geometrically distributed with parameter if all

The IHHMM allows for a potentially infinite number of levels.

Infinite Hierarchical HMM (IHHMM)

The generative process for given is similar to the HHMM:For the levels down to , the state is generated according to

The emissions matrix:

for the levels

Inference and Learning

The IHHMM is performed using Gibbs sampling and a modified forward-backtrack algorithm.It iterates between the following two steps:1. Sampling state values with fixed parameters for each level

Compute forward messages from to :

replace with for Resample and along the backward pass from to :


When the top level is reached, a new level above it will be created by setting all states with 1; If the level below the current top level has no state transitions, it becomes the new top level.

2. Sampling parameters given the current state:

Parameters are initialized as draws from the Dirichlet priors; Posteriors are calculated based on the counts of state transitions and emissions in the previous step.

Predicting new observations given the current state of the IHHMM:

1. Assume the top level learned from the IHHMM is , then calculate the following recursions from to :


2. Compute the probability of observing from :

Experiment Results

Sequential data generated

The sampled “states” used to generate the data

1. Data generated: sample sample sample

Experiment Results2. Demonstrate the model can capture the hierarchical structure

The first data set consists of repeats of integers increasing from 1 to 7, followed by repetitions of integers decreasing from 5 to 1, repeated twice. The second data is the first one concatenated with another series of repeated increasing and decreasing sequences of integers. 7 states is used in the model at all levels.

b)

Experiment Results

The predictive log probability of the next integer is calculated: HMM: 0.25 IHHMM: 0.31 HHMM: 0.30 (for 2-4 levels)

3. Spectral data from Handel’s Hallelujah chorus

Experiment Results

4. Alice in Wonderland letters data set.

The difference in log predictive likelihood between IHHMM and a HMM learned by EM

The difference in log predictive likelihood between IHHMM and a one level HMM learned by Gibbs sampling

• The mean differences in both plots are positive, demonstrating that the IHHMM gives superior performance on this data.• The long tails signifies that there are letters which can be better predicted with the higher hierarchical levels.

Final discussions 1. Relation to the HHMM: IHHMM is a nonparametric extension of the HHMM for an

unbounded hierarchy depth; The completion of an internal HHMM is governed by an

independent process.2. Other related work: Probabilistic context free grammars with multi-scale structure

learning; Infinite HMM, infinite factorial HMM;3. Future work: Make the number of states at each level infinite as well as the

infinite HMM; Higher order Markov chains; More efficient inference algorithms.

Cited References

[1] S. Fine, Y. Singer, and N. Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32: 41-62, 1998.

[2] K. Murphy and M.A. Paskin. Linear time inference in hierarchical HMMs. In Neural Information Processing Systems, 2001.

infinite hierarchical hidden markov models

Documents

new level

level right

current state

lower level hmms

nontransitioning level

state transitions

hhmm hierarchy

hhmm example