time series approaches to understanding protein dynamicscheongsa/talks/bii.pdf · time series...

30
Time Series Approaches to Understanding Protein Dynamics CHEONG Siew Ann Division of Physics and Applied Physics School of Physical and Mathematical Sciences Email: [email protected] Homepage: http://www1.spms.ntu.edu.sg/~cheongsa/

Upload: vuongnhi

Post on 01-Sep-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Time Series Approaches to Understanding Protein Dynamics

CHEONG Siew Ann

Division of Physics and Applied PhysicsSchool of Physical and Mathematical Sciences

Email: [email protected]: http://www1.spms.ntu.edu.sg/~cheongsa/

20 Jan 2011 Bioinformatics Institute 2

Macroscopic Thermal Physics

• Macroscopic order parameters differentiate– Solid (S)– Liquid (L)– Gas (G)

S

G

L

20 Jan 2011 Bioinformatics Institute 3

Microscopic Statistical Physics

• S, L, G time series distinguishable• S, L, G phase within single time

series distinguishable

δr fluctuates about 0,δr2 = αT time-independent

diffusive trajectories,δr2 increases with time

ballistic trajectories,infrequent collisions

20 Jan 2011 Bioinformatics Institute 4

From Micro to Macro

• Group statistically similar time series• Discover presence of different phases

20 Jan 2011 Bioinformatics Institute 5

Dynamics of a Small Protein

• ACE-(ALA)5-NME– 5 alanine repeated units– Capped by acetyl and methylamide groups– 62 atoms in all

• Simulation in water– MU Yuguang, School of Biological Sciences, NTU

• Fold into α-helix twice during simulation

20 Jan 2011 Bioinformatics Institute 6

Cross Correlations Between Time Series

• Pearson• Spearman• Digital

( )

Dow Jones US economic sector indices

( )⎟⎟⎠

⎞⎜⎜⎝

⎛ −⎟⎟⎠

⎞⎜⎜⎝

⎛ −=

j

jj

i

iiij

xtxxtxCσσ

20 Jan 2011 Bioinformatics Institute 7

Vector Pearson Correlations

• Velocity time series– Means– Covariances

• Basis-independent square deviations

• Scaled deviations

• Vector correlations

( ) ( )tt ji vv ,

jiijC ξξrr

⋅=

ji vv ,

ji ΣΣ ,

( )[ ] ( )[ ] ( )[ ] ( )[ ]jjjT

jjiiiT

ii tttt vvvvvvvv −Σ−−Σ− −− 11 ,

( )[ ]( )[ ]jjjj

iiii

t

t

vv

vv

−Σ=

−Σ=−

2/1

2/1 ,

ξ

ξr

r

20 Jan 2011 Bioinformatics Institute 8

Sliding Windows

20 Jan 2011 Bioinformatics Institute 9

Correlation Matrix(1250, 1500)

20 Jan 2011 Bioinformatics Institute 10

Hierarchical Clustering

• Complete linkage algorithm• Pairwise distance

• Determine as function of time– Number of clusters– Composition of clusters– Thresholds of clusters

( )ijij Cd −= 12

20 Jan 2011 Bioinformatics Institute 11

Complete Linkage Dendrogram(1250, 1500)

20 Jan 2011 Bioinformatics Institute 12

Reordered Correlation Matrix(1250, 1500)

20 Jan 2011 Bioinformatics Institute 13

Effective Variables(2250, 2500)

20 Jan 2011 Bioinformatics Institute 14

Effective Interactions(0, 500)

20 Jan 2011 Bioinformatics Institute 15

Effective Interactions(1250, 1500)

20 Jan 2011 Bioinformatics Institute 16

Effective Dynamics

TA TB

Ta

TA

TB

Ta

Tb

unfold?

unfold?

unfold?

fold?

fold? fold?

20 Jan 2011 Bioinformatics Institute 17

Segmentation vs Clustering

• Time Series Clustering– Discover effective mesoscopic variables in given time window– Discover slow time evolution of effective variables by sliding time

window• Time Series Segmentation

– Discover number/type of macroscopic phases– Discover lifetimes of macroscopic phases– Discover time scales of transitions between macroscopic phases

20 Jan 2011 Bioinformatics Institute 18

Nonstationarity in Time Series

20 Jan 2011 Bioinformatics Institute 19

Modeling Nonstationary Time Series

• Assume non-stationary time series– (v(1), v(2), …, v(t), …, v(N))– M stationary segments– In segment m, data points drawn from (μm, Σm) Gaussian

distribution• Recursive segmentation

– One time series → two segments– Each segment → two subsegments– Iterate + optimize– Terminate

20 Jan 2011 Bioinformatics Institute 20

Jensen-Shannon Divergence

• Single-segment likelihood for (v(1), v(2), …, v(N))

• Two-segment likelihood for (v(1), …, v(t), v(t+1), …, v(N))

• ML estimates• Jensen-Shannon divergence

( )[ ] ( )[ ]{ }∏=

− −Σ−−Σ

=N

i

T iiL1

11 exp

21 μμπ

vv

( ) ( )[ ] ( )[ ]{ } ( )[ ] ( )[ ]{ }∏∏+=

=

− −Σ−−Σ

−Σ−−Σ

=N

tiRR

TR

R

t

iLL

TL

L

iiiitL1

1

1

12 exp

21exp

21 μμ

πμμ

πvvvv

RLRL ΣΣΣ ˆ,ˆ,ˆ,ˆ,ˆ,ˆ μμμ( ) ( ) 0ln

1

2 ≥=ΔL

tLt

20 Jan 2011 Bioinformatics Institute 21

Recursive Segmentation

20 Jan 2011 Bioinformatics Institute 22

Start of Segmentation

20 Jan 2011 Bioinformatics Institute 23

End of Segmentation

20 Jan 2011 Bioinformatics Institute 24

Final Segmentations

few segments

many segments

20 Jan 2011 Bioinformatics Institute 25

Fluctuations Ellipsoid and Color Map

• Velocity fluctuations in v(t) characterized by covariance matrix Σ

• Eigenvalues λ1, λ2, λ3, semi-axes of fluctuations ellipsoid• Color map for segments

– Gray = spherical– Reddish/purplish = prolate– Greenish = oblate

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛=

max,1

3

max,1

2

max,1

1 ,,,,λλ

λλ

λλBGR

20 Jan 2011 Bioinformatics Institute 26

Temporal Distributions of Eccentricity

prolate → oblate + oblate → prolate

20 Jan 2011 Bioinformatics Institute 27

Temporal Distributions of Eccentricity

global transformations: not all H’s are involved!

20 Jan 2011 Bioinformatics Institute 28

Conclusions

• Dynamics of small protein from microscopic time series• Time series clustering

– Two synchronized clusters• Interaction clusters

– Effective dynamics from thresholds• Identify folding & unfolding events

– Nucleation from midpoint of protein• Time series segmentation

– Precisely identified global vs local events– Changes in fluctuations ellipsoid

• Potential to understand mechanisms– Nucleation from middle of protein

20 Jan 2011 Bioinformatics Institute 29

Acknowledgments

• Time Series Clustering– Mikhail FILIPPOV (PhD)

• Time Series Segmentation– Jeremy HADIDJOJO (PAP/4)

20 Jan 2011 Bioinformatics Institute 30

Thank You!