time series approaches to understanding protein dynamicscheongsa/talks/bii.pdf · time series...
TRANSCRIPT
Time Series Approaches to Understanding Protein Dynamics
CHEONG Siew Ann
Division of Physics and Applied PhysicsSchool of Physical and Mathematical Sciences
Email: [email protected]: http://www1.spms.ntu.edu.sg/~cheongsa/
20 Jan 2011 Bioinformatics Institute 2
Macroscopic Thermal Physics
• Macroscopic order parameters differentiate– Solid (S)– Liquid (L)– Gas (G)
S
G
L
20 Jan 2011 Bioinformatics Institute 3
Microscopic Statistical Physics
• S, L, G time series distinguishable• S, L, G phase within single time
series distinguishable
δr fluctuates about 0,δr2 = αT time-independent
diffusive trajectories,δr2 increases with time
ballistic trajectories,infrequent collisions
20 Jan 2011 Bioinformatics Institute 4
From Micro to Macro
• Group statistically similar time series• Discover presence of different phases
20 Jan 2011 Bioinformatics Institute 5
Dynamics of a Small Protein
• ACE-(ALA)5-NME– 5 alanine repeated units– Capped by acetyl and methylamide groups– 62 atoms in all
• Simulation in water– MU Yuguang, School of Biological Sciences, NTU
• Fold into α-helix twice during simulation
20 Jan 2011 Bioinformatics Institute 6
Cross Correlations Between Time Series
• Pearson• Spearman• Digital
( )
Dow Jones US economic sector indices
( )⎟⎟⎠
⎞⎜⎜⎝
⎛ −⎟⎟⎠
⎞⎜⎜⎝
⎛ −=
j
jj
i
iiij
xtxxtxCσσ
20 Jan 2011 Bioinformatics Institute 7
Vector Pearson Correlations
• Velocity time series– Means– Covariances
• Basis-independent square deviations
• Scaled deviations
• Vector correlations
( ) ( )tt ji vv ,
jiijC ξξrr
⋅=
ji vv ,
ji ΣΣ ,
( )[ ] ( )[ ] ( )[ ] ( )[ ]jjjT
jjiiiT
ii tttt vvvvvvvv −Σ−−Σ− −− 11 ,
( )[ ]( )[ ]jjjj
iiii
t
t
vv
vv
−Σ=
−Σ=−
−
2/1
2/1 ,
ξ
ξr
r
20 Jan 2011 Bioinformatics Institute 10
Hierarchical Clustering
• Complete linkage algorithm• Pairwise distance
• Determine as function of time– Number of clusters– Composition of clusters– Thresholds of clusters
( )ijij Cd −= 12
20 Jan 2011 Bioinformatics Institute 16
Effective Dynamics
TA TB
Ta
TA
TB
Ta
Tb
unfold?
unfold?
unfold?
fold?
fold? fold?
20 Jan 2011 Bioinformatics Institute 17
Segmentation vs Clustering
• Time Series Clustering– Discover effective mesoscopic variables in given time window– Discover slow time evolution of effective variables by sliding time
window• Time Series Segmentation
– Discover number/type of macroscopic phases– Discover lifetimes of macroscopic phases– Discover time scales of transitions between macroscopic phases
20 Jan 2011 Bioinformatics Institute 19
Modeling Nonstationary Time Series
• Assume non-stationary time series– (v(1), v(2), …, v(t), …, v(N))– M stationary segments– In segment m, data points drawn from (μm, Σm) Gaussian
distribution• Recursive segmentation
– One time series → two segments– Each segment → two subsegments– Iterate + optimize– Terminate
20 Jan 2011 Bioinformatics Institute 20
Jensen-Shannon Divergence
• Single-segment likelihood for (v(1), v(2), …, v(N))
• Two-segment likelihood for (v(1), …, v(t), v(t+1), …, v(N))
• ML estimates• Jensen-Shannon divergence
( )[ ] ( )[ ]{ }∏=
− −Σ−−Σ
=N
i
T iiL1
11 exp
21 μμπ
vv
( ) ( )[ ] ( )[ ]{ } ( )[ ] ( )[ ]{ }∏∏+=
−
=
− −Σ−−Σ
−Σ−−Σ
=N
tiRR
TR
R
t
iLL
TL
L
iiiitL1
1
1
12 exp
21exp
21 μμ
πμμ
πvvvv
RLRL ΣΣΣ ˆ,ˆ,ˆ,ˆ,ˆ,ˆ μμμ( ) ( ) 0ln
1
2 ≥=ΔL
tLt
20 Jan 2011 Bioinformatics Institute 25
Fluctuations Ellipsoid and Color Map
• Velocity fluctuations in v(t) characterized by covariance matrix Σ
• Eigenvalues λ1, λ2, λ3, semi-axes of fluctuations ellipsoid• Color map for segments
– Gray = spherical– Reddish/purplish = prolate– Greenish = oblate
( ) ⎟⎟⎠
⎞⎜⎜⎝
⎛=
max,1
3
max,1
2
max,1
1 ,,,,λλ
λλ
λλBGR
20 Jan 2011 Bioinformatics Institute 26
Temporal Distributions of Eccentricity
prolate → oblate + oblate → prolate
20 Jan 2011 Bioinformatics Institute 27
Temporal Distributions of Eccentricity
global transformations: not all H’s are involved!
20 Jan 2011 Bioinformatics Institute 28
Conclusions
• Dynamics of small protein from microscopic time series• Time series clustering
– Two synchronized clusters• Interaction clusters
– Effective dynamics from thresholds• Identify folding & unfolding events
– Nucleation from midpoint of protein• Time series segmentation
– Precisely identified global vs local events– Changes in fluctuations ellipsoid
• Potential to understand mechanisms– Nucleation from middle of protein
20 Jan 2011 Bioinformatics Institute 29
Acknowledgments
• Time Series Clustering– Mikhail FILIPPOV (PhD)
• Time Series Segmentation– Jeremy HADIDJOJO (PAP/4)