modeling time series with hidden markov models -...

Post on 15-Apr-2018

241 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Modeling time series with hidden Markov models

Advanced Machine learning2017

Nadia Figueroa, Jose Medina and Aude Billard

Time series data

Modeling time series with HMMs 2

Time

Dat

a

What‘s going on here?

Humidity

Barometric pressure

Temperature

Time series data

Modeling time series with HMMs 3

What‘s the problem setting?Time

Dat

a

We don’t care about time …

Time

Dat

a

We have several trajectories with identical

duration

Explicit time dependency

Hum

.

We have unstructured trajectory(ies)!

Consider dependency on the past

Too complex!

Unstructured time series data

Modeling time series with HMMs 4

How to simplify this problem?

Consider dependency on the past

Markov assumption

Rainy

Cloudy

Sunny

Outline

Modeling time series with HMMs 5

Second part (11:15 – 12:00):• Time series segmentation• Bayesian nonparametrics for HMMs

https://github.com/epfl-lasa/ML_toolbox

First part (10:15 – 11:00):• Recap on Markov chains• Hidden Markov Model (HMM)

- Recognition of time series- ML Parameter estimation Time

Dat

a

Outline first part

Modeling time series with HMMs 6

Rainy

Cloudy

Sunny

Markov chains

Modeling time series with HMMs 7

Sunny Cloudy

Rainy

Transition matrix

Sunny

Cloudy

Rainy

Sunny Cloudy Rainy

Initial probabilitiesSunny Cloudy Rainy

Modeling time series with HMMs 8

Likelihood of a Markov chain

Sunny Sunny Cloudy

Transition matrix

Sunny

Cloudy

Rainy

Sunny Cloudy Rainy

Initial probabilitiesSunny Cloudy Rainy

Modeling time series with HMMs 9

Learning Markov chains

Sunny Sunny Cloudy Periodic

Left-to-right

Ergodic

Topologies

Outline first part

Modeling time series with HMMs 10

Rainy

Cloudy

Sunny

Hidden Markov model

Modeling time series with HMMs 11

Sunny Cloudy

Rainy

Transition matrix

Sunny

Cloudy

Rainy

Sunny Cloudy Rainy

Initial probabilitiesSunny Cloudy Rainy

Likelihood of an HMM

Modeling time series with HMMs 12

Transition matrix

Sunny

Cloudy

Rainy

Sunny Cloudy Rainy

Initial probabilitiesSunny Cloudy Rainy

Forward variable

Likelihood of an HMM

Modeling time series with HMMs 13

Transition matrix

Sunny

Cloudy

Rainy

Sunny Cloudy Rainy

Initial probabilitiesSunny Cloudy Rainy

Forward variable

Likelihood of an HMM

Modeling time series with HMMs 14

Transition matrix

Sunny

Cloudy

Rainy

Sunny Cloudy Rainy

Initial probabilitiesSunny Cloudy Rainy

Backward variable

Learning an HMM

Modeling time series with HMMs 15

Baum-Welch algorithm(Expectation-Maximization for HMMs)- Iterative solution- Converges to local minimum

Starting from an initial find a such that

• E-step: Given an observation sequence and a model, find the probabilities of the states to have produced those observations.

• M-step: Given the output of the E-step, update the model parameters to better fit the observations.

Learning an HMM

Modeling time series with HMMs 16

Probability of being in state i at time k

Probability of being in state i at time k andtransition to state j

E-step:

Learning an HMM

Modeling time series with HMMs 17

Probability of being in state i at time k

Probability of being in state i at time k andtransition to state j

M-step:

Learning an HMM

Modeling time series with HMMs 18

• HMM is a parametric technique (Fixed number of states, fixed topology) • Heuristics to determining the optimal number of states

( )

: dataset; : number of datapoints; : number of free parameters- Aikaike Information Criterion: AIC= 2ln 2- Bayesian Information Criterion: 2 ln lnL: maximum likelihood of the model giv

X N KL K

BIC L K N− +

= − +

en K parameters

Lower BIC implies either fewer explanatory variables, better fit, or both.As the number of datapoints (observations) increase, BIC assigns more weights to simpler models than AIC.

Choosing AIC versus BIC depends on the application:Is the purpose of the analysis to make predictions, or to decide which model best represents reality?AIC may have better predictive ability than BIC, but BIC finds a computationally more efficient solution.

Applications of HMMs

Modeling time series with HMMs 19

State estimation:What is the most probable state/state sequence of the system?

Prediction:What are the most probable next observations/state of the system?

Model selection:What is the most likely model that represents these observations?

Examples

Modeling time series with HMMs 20

Speech recognition:

• Left-to-right model

• States are phonemes

• Observations in frequency domain

D.B. Paul., Speech Recognition Using Hidden Markov Models, The Lincoln laboratory journal, 1990

Examples

Modeling time series with HMMs 21

Motion prediction:

• Periodic model

• Observations are observed joints

• Simulate/predict walking patterns

Karg, Michelle, et al. "Human movement analysis: Extension of the f-statistic to time series using hmm." Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on. IEEE, 2013.

Examples

Modeling time series with HMMs 22

Motion prediction:

• Left-to-right models

• Autonomous segmentation

• Recognition + prediction

Examples

Modeling time series with HMMs 23

Motion prediction:

• Left-to-right models

• Autonomous segmentation

• Recognition + prediction

Examples

Modeling time series with HMMs 24

Motion prediction:

• Left-to-right model

• Each state is a dynamical system

Examples

Modeling time series with HMMs 25

Motion recognition:

• Recognition of most likely motion and prediction of next step.

MATLAB demo

Toy training set• 1 player• 7 actions• 1 Hidden Markov model per action

Outline

Modeling time series with HMMs 26

Second part (11:15 – 12:00):• Time series segmentation• Bayesian non-parametrics for HMMs

https://github.com/epfl-lasa/ML_toolbox

First part (10:15 – 11:00):• Recap on Markov chains• Hidden Markov Model (HMM)

- Recognition of time series- ML Parameter estimation Time

Dat

a

Time series Segmentation

Modeling time series with HMMs 27

Times-series = Sequence of discrete segments

Why is this an important problem?

Segmentation of Speech Signals

Modeling time series with HMMs 28

Segmenting a continuous speech signal into sets of distinct words.

Modeling time series with HMMs 29

I amseafood

on a diet.

I see foodand I

eatit!

Segmentation of Speech SignalsSegmenting a continuous speech signal

into sets of distinct words.

Segmentation of Human Motion Data

Modeling time series with HMMs 30

Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009

Segmention of Continuous Motion Capture data from exercise routines into motion categories

Jumping Jacks

Arm Circles Squats

Knee Raises

Segmentation in Human Motion Data

Modeling time series with HMMs 31

Emily Fox et al.., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009

12 Variables- Torso position- Waist Angles (2)- Neck Angle- Shoulder Angles- ..

Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009

Segmentation in Robotics

Modeling time series with HMMs 32

Learning Complex Sequential Tasks from Demonstration

7 Variables- Position

- Orientation

Reach Grate Trash

HMM for Time series Segmentation

Modeling time series with HMMs 33

Assumptions:

• The time-series has been generated by a system that transitions between a set of hidden states:

• At each time step, a sample is drawn from an emission model associated to the current hidden state:

HMM for Time series Segmentation

Modeling time series with HMMs 34

How do we find these segments?

Presenter
Presentation Notes
In other words, the HMM describes time-series data with a mixture model that has temporal dependence in its components, through a first-order Markov chain.

HMM for Time series Segmentation

Modeling time series with HMMs 35

Initial State Probabilities

Transition Matrix

Emission ModelParameters

Baum-Welch algorithm(Expectation-Maximization for HMMs)- Iterative solution- Converges to local minimum

Steps for Segmentation with HMM:1. Learn the HMM parameters through Maximum Likelihood

Estimate (MLE):

HMM Likelihood

Hyper-parameter:Number of states possible K

HMM for Time series Segmentation

Modeling time series with HMMs 36

Steps for Segmentation with HMM:2. Find the most probable sequence of states generating the observations through the Viterbi algorithm:

HMM Joint Probability Distribution

HMM for Time series Segmentation

Modeling time series with HMMs 37

Presenter
Presentation Notes
In other words, the HMM describes time-series data with a mixture model that has temporal dependence in its components, through a first-order Markov chain.

HMM for Time series Segmentation

Modeling time series with HMMs 38

Presenter
Presentation Notes
In other words, the HMM describes time-series data with a mixture model that has temporal dependence in its components, through a first-order Markov chain.

Model Selection for HMMs

Modeling time series with HMMs

Model Selection for HMMs

Modeling time series with HMMs

?

Limitations of classical finite HMMs for Segmentation

Modeling time series with HMMs 41

Cardinality: Choice of hidden states is based on Model Selection heuristics, there is little understanding of the strengths and weaknesses of such methods in this setting [1].

[1] Emily Fox et al., An HDP-HMM for Systems with State Persistence, ICML, 2008[2] Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009

Topology: We assume that all time series share the same set of emission models and switch among them in exactly the same manner [2].

Undefined # hidden states

Fixed Transition Matrix

Solution: Bayesian Non-Parametrics

Bayesian Non-Parametrics

Modeling time series with HMMs

• Bayesian: Use Bayesian inference to estimate the parameters; i.e. priors on model parameters!

• Non-parametric: Does NOT mean methods with “no parameters”, rather models whose complexity (# of states, # Gaussians) is inferred from the data.

1. Number of parameters grows with sample size.2. Infinite-dimensional parameter space!

Prior Prior

BNP for HMMs: HDP-HMM

Modeling time series with HMMs

Cardinality

• Hierarchical Dirichlet Process (HDP) prior on the transition Matrix!

• Normal Inverse Wishart (NIW) prior on emission parameters!

Hyper-parameters!

Emily Fox et al., An HDP-HMM for Systems with State Persistence, ICML, 2008

BNP for HMMs: HDP-HMM

Modeling time series with HMMs

Cardinality

• The Dirichlet Process (DP) is a prior distribution over distributions.

• Used for clustering with infinite mixture models; i.e. instead of setting K in a GMM, the K is learned from data.

This only gives us an estimate of the K clusters!Cannot be use directly on transition matrix:

• The Hierarchical Dirichlet Process (HDP) is a hierarchy of DPs!

Segmentation with HDP-HMM

Modeling time series with HMMs

Emily Fox et al., An HDP-HMM for Systems with State Persistence, ICML, 2008

Compare to Model Selection with Classical HMMs

Modeling time series with HMMs

BNP for HMMs: BP-HMM

Modeling time series with HMMs

Cardinality Topology

• The Beta Process (BP) prior on the transition Matrix!

• Normal Inverse Wishart (NIW) prior on emission parameters!Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009

BNP for HMMs: BP-HMM

Modeling time series with HMMs

Cardinality Topology

• The Beta Process (BP) prior on the transition Matrix!

Features (i.e. shared HMM States)

Time-Series

Segmentation in Human Motion Data

Modeling time series with HMMs 49

Emily Fox et al.., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009

12 Variables- Torso position- Waist Angles (2)- Neck Angle- Shoulder Angles- ..

Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009

Applications in Robotics

Modeling time series with HMMs 50

Learning Complex Sequential Tasks from Demonstration

top related