構造的学習チーム 河原吉伸 structured learning team y....

1
構造的学習チーム 河原 吉伸 Structured Learning Team Y. KAWAHARA Kl@+ [1] N. Takeishi, Y. Kawahara, Y. Tabei & T. Yairi, ``Bayesian Dynamic Mode Decomposition,” in Proc. of IJCAI’17, pp.2814-2821, 2017. [2] K. Fujii, Y. Inaba & Y. Kawahara, ``Koopman spectral kernels for comparing complex dynamics with application to multiagent in sports,” in Proc. of ECML-PKDD’17, pp.127-139, 2017. [3] K. Takeuchi, Y. Kawahara & T. Iwata, ``Structurally regularized non-negative tensor factorization for spatio-temporal pattern discoveries,” in Proc. of ECML-PKDD’17, pp.582-598, 2017. [4] Y. Kawahara, ``Dynamic mode decomposition with reproducing kernels for Koopman spectral analysis,” in Advances in Neural Information Processing Systems 29, pp.911-919, 2016. [5] N. Takeishi, Y. Kawahara & T. Yairi, ``Learning Koopman invariant subspaces for dynamic mode decomposition,” in Advances in Neural Information Processing Systems 30, pp.1130-1140, 2017. [6] K. Takeuchi, Y. Kawahara & T. Iwata, ``Higher order fused regularization for supervised learning with grouped parameters,” in Proc. of ECML-PKDD’15, pp.577-593, 2015. °®¯³78ÛÙAW įŠPw¡]èEÓýĚøVÝÛp|rLe «¹¬ rĐĚÿÛĈëör29×GêĔñēöĎÛ q ·¯¸ «º¬ îĚćČėęõĉîþĔèpÇѦ;ùëĀčîõheÛ q ·°¸ «·²¸¬ Ä°Å ©¤~"b¦üė÷ĔÚâåPw¡ýĚøVeÛ q ·±¸ Pw¡]èEÓýĚøVÝÛp|rLe ěÄ1ÅÛQĜ ©¤~"b¦üė÷ĔÚâåPw¡ýĚøVe Ä2Å rĐĚÿÛĈëör29×GêĔñēöĎ ěAWÄ1Å«¹¬Ĝ g g F x t x t+1 K #%'" ˙ x = f (x) Kg(·)= g(F (·)) .& )#% (g : S ! C) S£ýĚøèpÇÑ ÛõĉîþĔÚâå => (( () îĚćČėęõĉîþĔèpÇѦ;ùëĀčîõheěAWÄ1Å«º¬Ĝ .&(- &( ,&$* +$* !$* F y 0,j y 1,j ' j σ 2 wi v 2 i,1:n λi m k ę rĐĚÿÚ4>ÏåtmĐýĔÛ5 => òėćĔj£Û\£ÖM0ÛrĐĚÿěTL-DMDĜ×ÏåÍ×èN => ùëĀčîõÚ¢Ïå@+èĈëöGÚâäpÙX}Þ ę íĆõòėćēėïèpÇÑGêĔñēöĎè5 ę rĐĚÿÖG2ÎÑîĚćČėęõĉîþĔèpÇÕЦ;óõüĎÛr k?ècÏåèÎĞÍæèpÇÑ1ÛX}ÞèHY => Pw¡ýĚøÉã¦;?èÈrk?ÛáШÙØÛhè3n %fùëĀčîõ ©%fùëĀčîõ ěÆåPÛýĚøĜ -3 -2 -1 0 1 2 3 Im(log(6)="t) -1.5 -1 -0.5 0 0.5 DMD TLS-DMD BDMD (average) Re(log(6)="t) Time stamps for 24 hours Sensor locations Days U (1) U (2) U (3) X Factor 2 02.Aug 09.Aug 16.Aug 23.Aug 30.Aug 06.Sep 13.Sep 20.Sep 27.Sep Time [day] F Factor 2 00:00 06:00 12:00 18:00 Time [hour] F b§ÊĐôđĒ¢JÛĖĘéĚõD:×ÙÒÕÇå?èpÇÕĞ ©ÙRěADMMĜÚâåąĒďĚøG2è3n îĚćČėp|Û'S t=1 t=2 t=3 t=4 t=5 t=1 t=2 t=3 t=4 t=5 DMDĐĚÿ V1 ýĚøğ ěP¡L#Ĝ ěw¡L#Ĝ DMDĐĚÿ V2 ýĚøĠ ěP¡L#Ĝ ěw¡L#Ĝ ex. K ę ©¤~"b[6]Ú*Ô˦üė÷ĔêĔñēöĎè5 => Pw¡ýĚøÉãÛk=r]ÛCÝp ę ĐôđĒ?èpÇÑ©ÙąĒďĚøG2eÛ5 => -_pè3n ěgnÛĜ K' j = λ j ' j rĐĚÿ (Rowley+ 09, Schmid 10) ę <UDMDĐýĔÜĞd2róõüĎè2ÎÑX}ÞÖÆÒÑ=> tmrĐýĔè 5ÎvÃÛ`Z1rLeèpÏåÑßÛ*sè]yěAWÄ1Å(a)Ĝ ę ÚýĚøV×ÎÕÖÜÙËĞrÙ1ÝpÇåX}Þè]y => AWÄ1Å(b) ę rĐĚÿÛÛ?èÆÌåÑßÛāđĚĒĔĂúþèpÇÑX}Þxà T78Ú qě=> [5]ÙØĜ HYBe×[4]è}Þ"çÐÕõċĚûěĄõðúþĊĚĔĜhýĚøÝpÎÑġ pÇÑ ýĚø !ćĕĚè2)ßÞÎÑ ě/¥: óđĚþA/.IĜ CÎÑk=èpÇÑ óđĚþ$Û¨z8 ě/¥: ìĚĂĔDMD/DMDĜ 5ĐýĔġ ětmrDMDĜ |b6 y0,t CN P k i=1 't,iwi, σ 2 I y1,t CN P k i=1 λi't,iwi, σ 2 I ' t,i CN (0, 1) ąĒďĚøÚ6èÇÕĈëörGà HYĐýĔěBDMDĜÛS£òėćĔÚâå G~WÊM0DMDÚÏåÉèt ÎÑ[ěStuart-LandauLu9ĝăëöĜ k 1 2 3 4 k 1 2 3 4 k 1 2 3 4 Sparse BDMD zn = 2 6 6 4 0 -5 0 0 2 -4 0 0 3 -3 0 0 4 0 0 0 3 7 7 5 2 6 6 4 0.9 n 0.7 n 0 0 3 7 7 5 + en DMD Ground Truth i(&/ě´¹ÂĜ i(&/ě¶½¾¼Ĝ i(&/ěµÀ»¹Á½À¿Ĝ ©¤~"b¦ üė÷Ĕ Û29 õąĚõ6èpÇÑ,"ÛJ ýĚøoAÚpÇÑĐýĔ ¦üė÷ĔÛ¢J 'SÙùëĀčîõÝÛ g(x t+c )= q X j=1 λ c j ' j (x j )v j ěP¡L#Ĝ ěw¡L#Ĝ ©aÛP{ýĚø ěP¡L#Ĝ DMDĐĚÿ v1 hęh !ÃÛDMDĐĚÿÚ4> P¡rÙF? ©¤~"bÚâä ^ÃÙPw¡r@+èn w¡rÙF? P¡rÙïĔĚć¢ w¡rÙïĔĚć¢ D 1 D 2 rk?èOÎÑ ěìĚĂĔĜè5 k(D1, D2)= PA(V1, V2) Ĝ A ¦;Ùrk?èOÎÑ 1ě¨ęhĜè3n Pw¡ýĚø ěüė÷ĔĜ i(&/ ěĜ

Upload: others

Post on 15-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 構造的学習チーム 河原吉伸 Structured Learning Team Y. …44nobu.sakura.ne.jp/kawahara_lab/data/20180316_aip.pdf · [6] K. Takeuchi, Y. Kawahara & T. Iwata, ``Higher order

構造的学習チーム 河原吉伸Structured Learning Team Y. KAWAHARA

Kl@+ [1] N. Takeishi, Y. Kawahara, Y. Tabei & T. Yairi, ``Bayesian Dynamic Mode Decomposition,” in Proc. of IJCAI’17, pp.2814-2821, 2017.[2] K. Fujii, Y. Inaba & Y. Kawahara, ``Koopman spectral kernels for comparing complex dynamics with application to multiagent in sports,” in Proc. of ECML-PKDD’17, pp.127-139, 2017.

[3] K. Takeuchi, Y. Kawahara & T. Iwata, ``Structurally regularized non-negative tensor factorization for spatio-temporal pattern discoveries,” in Proc. of ECML-PKDD’17, pp.582-598, 2017.[4] Y. Kawahara, ``Dynamic mode decomposition with reproducing kernels for Koopman spectral analysis,” in Advances in Neural Information Processing Systems 29, pp.911-919, 2016.[5] N. Takeishi, Y. Kawahara & T. Yairi, ``Learning Koopman invariant subspaces for dynamic mode decomposition,” in Advances in Neural Information Processing Systems 30, pp.1130-1140, 2017.

[6] K. Takeuchi, Y. Kawahara & T. Iwata, ``Higher order fused regularization for supervised learning with grouped parameters,” in Proc. of ECML-PKDD’15, pp.577-593, 2015.

°®¯³78�Û��ÙAWįŠPw¡]�èEÓýĚø�VÝÛp|�rLe«¹¬­ª�rĐĚÿ��ÛĈëör29�×G�êĔñēöĎÛ q ·¯¸«º¬­ªîĚćČėęõĉîþĔèpÇѦ�;ùëĀčîõ�heÛ q ·°¸ «·²¸¬

Ä°Å ©¤~"b��¦��üė÷Ĕ��ÚâåPw¡ýĚø�VeÛ q ·±¸

Pw¡]�èEÓýĚø�VÝÛp|�rLe ěÄ1ÅÛ�QĜ

©¤~"b��¦��üė÷Ĕ��ÚâåPw¡ýĚø�Ve Ä2Å

�rĐĚÿ��ÛĈëör29�×G�êĔñēöĎ ěAWÄ1Å«¹¬Ĝ

g g � F

xt xt+1

K

x = f(x)

Kg(·) = g(F (·))

)

(g : S ! C)

S£ýĚøèpÇÑ ÛõĉîþĔÚâå�� =>

(

îĚćČėęõĉîþĔèpÇѦ�;ùëĀčîõ�heěAWÄ1Å«º¬Ĝ

F

y0,j

y1,j

'j

�2wiv2i,1:n

�i

mk

ę �rĐĚÿ��Ú4>ÏåtmĐýĔÛ5�=> òėćĔj£Û\£ÖM0Û�rĐĚÿ��ěTL-DMDĜ×��ÏåÍ×è�N=> ùëĀčîõÚ¢Ïå��@+èĈëöG�Úâä�p �ÙX}ÞęíĆõòėćēėïèpÇÑG�êĔñēöĎè5�

ę�rĐĚÿ��ÖG2ÎÑîĚćČėęõĉîþĔèpÇÕЦ�;óõüĎÛ�rk?èc�Ïå��è��ÎĞÍæèpÇÑ1�ÛX}ÞèHY=> Pw¡ýĚøÉã¦�;?è�È�rk?Û ��áĞ�¨ÙØÛ�hè3n

…�%fùëĀčîõ ©%fùëĀčîõěÆåP�ÛýĚø�Ĝ

-3 -2 -1 0 1 2 3Im(log(6)="t)

-1.5-1

-0.50

0.5DMD TLS-DMD BDMD (average)

Re(

log(6)="

t)

Latent factor for day mode

Latent factor for time mode

Latent factor for location mode

Time stamps for 24 hours

Sensorlocations

Days

≃U (1)

U (2)

U (3)

X

Latent factor matrices for three-way tensor

Tensor for spatio-temporal data with missing values

Aug 09.Aug 16.Aug 23.Aug 30.Aug 06.Sep 13.Sep 20.Sep 27.Sep

Factor 2

02.Aug 09.Aug 16.Aug 23.Aug 30.Aug 06.Sep 13.Sep 20.Sep 27.SepTime [day]

F

:00 06:00 12:00 18:00

Factor 2

00:00 06:00 12:00 18:00Time [hour]

F

b��§Ê�ĐôđĒ¢JÛĖĘéĚõD:×ÙÒÕÇå?�èpÇÕĞ©�ÙR��ěADMMĜÚâåąĒďĚøG2è3n

îĚćČėp|Û'S��

t=1 t=2 t=3 t=4 t=5t=1 t=2 t=3 t=4 t=5

29

459

460

Figure 5. Prediction performance of DMD with reproducing kernels. This plot 461

shows the error of naive Bayes classifier for success or failure of the shot, using either 462

the DMD (blue) or the DMD with reproducing kernels (red) inputting various matrices. 463

1 to 25 indicates the number of input distance series shown in Fig. 4 left (i to vii). The 464

Koopman spectral kernel derived by inputting four relevant inter-player distances 465

achieved the minimum classification error of 35.9%. Overall, the Koopman spectral 466

kernels produced better classifications than did the kernels of the original DMD. The 467

kernel of the original DMD using only one distance was not computed because of its 468

low expressiveness. 469

470

DMDĐĚÿ V1ýĚøğ

ěP¡L#Ĝ

ěw¡L#Ĝ

4 B.W. Brunton et al. / Journal of Neuroscience Methods 258 (2016) 1–15

Fig. 1. A comparison of DMD with common modal decomposition algorithms on a synthetic timeseries dataset. (a) The dataset is a movie with 6400 pixels in each frame, andthis noisy, high-dimensional time series dataset has two underlying, overlapping patterns, a Gaussian oval and a square. Each mode also has a distinct temporal evolutionthat includes both growth/decay and oscillation. The magnitude of the noise is 0.75× the magnitude of the signals. (b) PCA derives modes that mix the underlying modes. (c)ICA modes more closely resemble the generative modes, but the two underlying modes are still mixed. (d) DMD extracts the spatial–temporal coherent modes in the movie.These DMD modes closely resemble the underlying spatial modes and provide an estimate of the temporal evolution of these patterns.

2.1.2. DMD, PCA and ICA on a synthetic datasetTo build some intuition of DMD modes, Fig. 1 compares results of

modal decomposition by PCA, independence components analysis(ICA, Hyvärinen and Oja (2000)), and DMD on a synthetic timeseriesdataset. The timeseries dataset is a movie 10 s in duration sampledat 50 frames per second, where each frame is a 80 × 80 = 6400 pixelimage (so n = 6400 and m = 500). The dataset is constructed to bethe sum of two generative modes, each with a spatial pattern thatevolves according to some coherent temporal dynamics (Fig. 1a).Mode 1 is a Gaussian oval in space that oscillates and decays in time;Mode 2 is a square that oscillates at a lower frequency than theoval does. The two modes are spatially overlapping. Each mode’smagnitude is of range [−1, 1], and independent noise drawn froma Gaussian distribution N(0, 0.75) was added at each pixel.

Fig. 1b–d shows the first two modes computed by each method.As shown in Fig. 1b, PCA derives modes that mix the two genera-tive modes in Fig. 1a. These PCA modes are vectors in Rn, ordered bytheir ability to explain the greatest fraction of variance in the data;PCA assumes the data is distributed as a multi-dimensional Gauss-ian. Fig. 1c shows that ICA can potentially do better than PCA, but thetwo generative modes are still mixed. ICA mode 1 (top of Fig. 1c)contains the Gaussian oval with a shadow of the square. UnlikePCA, ICA modes are computed assuming the underlying signals arenon-Gaussian and statistically independent.

In contrast, DMD is an explicitly temporal decompositionand takes the sequences of snapshots into account, derivingspatial–temporal coherent patterns in the movie. DMD modes areclosely related to PCA modes and also assumes variance in the datais Gaussian. The two largest DMD modes not only closely resem-ble the two generative modes, but they also contain an estimateof the temporal dynamics of the two modes, including an estimateof their frequencies of oscillation and time constant of exponentialgrowth/decay. These temporal parameters are computed from theDMD eigenvalues by Eq. (6) as explained in Section 2.4. Further,the computational complexity of DMD is within the same order ofmagnitude as that of PCA.

2.1.3. Connections to related methodsDMD has deep mathematical connections to Koopman spectral

analysis. The Koopman operator is an infinite-dimensional, linear

operator that represents finite-dimensional, nonlinear dynamics.The eigenvalues and modes of the Koopman operator capturethe evolution of data measuring the nonlinear dynamical system(Budisic et al., 2012; Mezic, 2005). DMD is an approximation ofKoopman spectral analysis (Rowley et al., 2009), so that DMDmodes are able to describe even nonlinear systems.

As an algorithm, it is convenient to think of spatial–temporaldecomposition by DMD as a hybrid of static mode extraction byprincipal components analysis (PCA) in the spatial domain and dis-crete Fourier transform (DFT) in the time domain. In fact, DMDmodes are a rotation of PCA space such that each basis vector hascoherent dynamics. The DMD algorithm in Section 2.1 starts witha SVD of the data matrix X = U!V* as the first step, where U areidentical to PCA modes. DMD modes are eigenvectors of A = U∗AU,so that we can think of A as the correlation between PCA modes Uand PCA modes in one time step AU. Liked the DFT, DMD extractsfrequencies of oscillations observed in the measurements. In addi-tion, DMD goes beyond DFT to also estimate rates of growth/decay,where the DFT eigenvalues always have magnitudes of exactly one.

The general formulation of the high-dimensional timeseriesproblem is related to several methods in the statistics literature,including vector autoregression (VAR, Charemza and Deadman(1992)). DMD differs from VAR in that the A matrix in Eq. (2) isnever explicitly estimated, but rather we seek its eigendecomposi-tion by computing A in step 2 of the DMD algorithm. The resultantmodes are interpreted as a low-rank dynamical system expressedin Eq. (5). Further, these modes represent separable spatiotemporalfeatures of the data. Interestingly, this approach of computing A ismathematically related to Principal Components Regression (PCR,Jolliffe (2005)).

2.1.4. Additional properties and practical limitationsA few general properties of DMD are interesting to note. The

data X may be real or complex valued; in the case of recordingsfrom electrode arrays, we will proceed assuming X are real val-ued measurements of voltage. Further, the decomposition is unique(Chen et al., 2012), and it is also possible to compute the DMD ofnon-uniformly sampled data (Tu et al., 2013).

The relationship of DMD to PCA and DFT points to a few lim-itations of the technique that guide its application. DMD spatial

4 B.W. Brunton et al. / Journal of Neuroscience Methods 258 (2016) 1–15

Fig. 1. A comparison of DMD with common modal decomposition algorithms on a synthetic timeseries dataset. (a) The dataset is a movie with 6400 pixels in each frame, andthis noisy, high-dimensional time series dataset has two underlying, overlapping patterns, a Gaussian oval and a square. Each mode also has a distinct temporal evolutionthat includes both growth/decay and oscillation. The magnitude of the noise is 0.75× the magnitude of the signals. (b) PCA derives modes that mix the underlying modes. (c)ICA modes more closely resemble the generative modes, but the two underlying modes are still mixed. (d) DMD extracts the spatial–temporal coherent modes in the movie.These DMD modes closely resemble the underlying spatial modes and provide an estimate of the temporal evolution of these patterns.

2.1.2. DMD, PCA and ICA on a synthetic datasetTo build some intuition of DMD modes, Fig. 1 compares results of

modal decomposition by PCA, independence components analysis(ICA, Hyvärinen and Oja (2000)), and DMD on a synthetic timeseriesdataset. The timeseries dataset is a movie 10 s in duration sampledat 50 frames per second, where each frame is a 80 × 80 = 6400 pixelimage (so n = 6400 and m = 500). The dataset is constructed to bethe sum of two generative modes, each with a spatial pattern thatevolves according to some coherent temporal dynamics (Fig. 1a).Mode 1 is a Gaussian oval in space that oscillates and decays in time;Mode 2 is a square that oscillates at a lower frequency than theoval does. The two modes are spatially overlapping. Each mode’smagnitude is of range [−1, 1], and independent noise drawn froma Gaussian distribution N(0, 0.75) was added at each pixel.

Fig. 1b–d shows the first two modes computed by each method.As shown in Fig. 1b, PCA derives modes that mix the two genera-tive modes in Fig. 1a. These PCA modes are vectors in Rn, ordered bytheir ability to explain the greatest fraction of variance in the data;PCA assumes the data is distributed as a multi-dimensional Gauss-ian. Fig. 1c shows that ICA can potentially do better than PCA, but thetwo generative modes are still mixed. ICA mode 1 (top of Fig. 1c)contains the Gaussian oval with a shadow of the square. UnlikePCA, ICA modes are computed assuming the underlying signals arenon-Gaussian and statistically independent.

In contrast, DMD is an explicitly temporal decompositionand takes the sequences of snapshots into account, derivingspatial–temporal coherent patterns in the movie. DMD modes areclosely related to PCA modes and also assumes variance in the datais Gaussian. The two largest DMD modes not only closely resem-ble the two generative modes, but they also contain an estimateof the temporal dynamics of the two modes, including an estimateof their frequencies of oscillation and time constant of exponentialgrowth/decay. These temporal parameters are computed from theDMD eigenvalues by Eq. (6) as explained in Section 2.4. Further,the computational complexity of DMD is within the same order ofmagnitude as that of PCA.

2.1.3. Connections to related methodsDMD has deep mathematical connections to Koopman spectral

analysis. The Koopman operator is an infinite-dimensional, linear

operator that represents finite-dimensional, nonlinear dynamics.The eigenvalues and modes of the Koopman operator capturethe evolution of data measuring the nonlinear dynamical system(Budisic et al., 2012; Mezic, 2005). DMD is an approximation ofKoopman spectral analysis (Rowley et al., 2009), so that DMDmodes are able to describe even nonlinear systems.

As an algorithm, it is convenient to think of spatial–temporaldecomposition by DMD as a hybrid of static mode extraction byprincipal components analysis (PCA) in the spatial domain and dis-crete Fourier transform (DFT) in the time domain. In fact, DMDmodes are a rotation of PCA space such that each basis vector hascoherent dynamics. The DMD algorithm in Section 2.1 starts witha SVD of the data matrix X = U!V* as the first step, where U areidentical to PCA modes. DMD modes are eigenvectors of A = U∗AU,so that we can think of A as the correlation between PCA modes Uand PCA modes in one time step AU. Liked the DFT, DMD extractsfrequencies of oscillations observed in the measurements. In addi-tion, DMD goes beyond DFT to also estimate rates of growth/decay,where the DFT eigenvalues always have magnitudes of exactly one.

The general formulation of the high-dimensional timeseriesproblem is related to several methods in the statistics literature,including vector autoregression (VAR, Charemza and Deadman(1992)). DMD differs from VAR in that the A matrix in Eq. (2) isnever explicitly estimated, but rather we seek its eigendecomposi-tion by computing A in step 2 of the DMD algorithm. The resultantmodes are interpreted as a low-rank dynamical system expressedin Eq. (5). Further, these modes represent separable spatiotemporalfeatures of the data. Interestingly, this approach of computing A ismathematically related to Principal Components Regression (PCR,Jolliffe (2005)).

2.1.4. Additional properties and practical limitationsA few general properties of DMD are interesting to note. The

data X may be real or complex valued; in the case of recordingsfrom electrode arrays, we will proceed assuming X are real val-ued measurements of voltage. Further, the decomposition is unique(Chen et al., 2012), and it is also possible to compute the DMD ofnon-uniformly sampled data (Tu et al., 2013).

The relationship of DMD to PCA and DFT points to a few lim-itations of the technique that guide its application. DMD spatial

……… … ⇡

DMDĐĚÿ V2ýĚøĠěP¡L#Ĝ

ěw¡L#Ĝ

4 B.W. Brunton et al. / Journal of Neuroscience Methods 258 (2016) 1–15

Fig. 1. A comparison of DMD with common modal decomposition algorithms on a synthetic timeseries dataset. (a) The dataset is a movie with 6400 pixels in each frame, andthis noisy, high-dimensional time series dataset has two underlying, overlapping patterns, a Gaussian oval and a square. Each mode also has a distinct temporal evolutionthat includes both growth/decay and oscillation. The magnitude of the noise is 0.75× the magnitude of the signals. (b) PCA derives modes that mix the underlying modes. (c)ICA modes more closely resemble the generative modes, but the two underlying modes are still mixed. (d) DMD extracts the spatial–temporal coherent modes in the movie.These DMD modes closely resemble the underlying spatial modes and provide an estimate of the temporal evolution of these patterns.

2.1.2. DMD, PCA and ICA on a synthetic datasetTo build some intuition of DMD modes, Fig. 1 compares results of

modal decomposition by PCA, independence components analysis(ICA, Hyvärinen and Oja (2000)), and DMD on a synthetic timeseriesdataset. The timeseries dataset is a movie 10 s in duration sampledat 50 frames per second, where each frame is a 80 × 80 = 6400 pixelimage (so n = 6400 and m = 500). The dataset is constructed to bethe sum of two generative modes, each with a spatial pattern thatevolves according to some coherent temporal dynamics (Fig. 1a).Mode 1 is a Gaussian oval in space that oscillates and decays in time;Mode 2 is a square that oscillates at a lower frequency than theoval does. The two modes are spatially overlapping. Each mode’smagnitude is of range [−1, 1], and independent noise drawn froma Gaussian distribution N(0, 0.75) was added at each pixel.

Fig. 1b–d shows the first two modes computed by each method.As shown in Fig. 1b, PCA derives modes that mix the two genera-tive modes in Fig. 1a. These PCA modes are vectors in Rn, ordered bytheir ability to explain the greatest fraction of variance in the data;PCA assumes the data is distributed as a multi-dimensional Gauss-ian. Fig. 1c shows that ICA can potentially do better than PCA, but thetwo generative modes are still mixed. ICA mode 1 (top of Fig. 1c)contains the Gaussian oval with a shadow of the square. UnlikePCA, ICA modes are computed assuming the underlying signals arenon-Gaussian and statistically independent.

In contrast, DMD is an explicitly temporal decompositionand takes the sequences of snapshots into account, derivingspatial–temporal coherent patterns in the movie. DMD modes areclosely related to PCA modes and also assumes variance in the datais Gaussian. The two largest DMD modes not only closely resem-ble the two generative modes, but they also contain an estimateof the temporal dynamics of the two modes, including an estimateof their frequencies of oscillation and time constant of exponentialgrowth/decay. These temporal parameters are computed from theDMD eigenvalues by Eq. (6) as explained in Section 2.4. Further,the computational complexity of DMD is within the same order ofmagnitude as that of PCA.

2.1.3. Connections to related methodsDMD has deep mathematical connections to Koopman spectral

analysis. The Koopman operator is an infinite-dimensional, linear

operator that represents finite-dimensional, nonlinear dynamics.The eigenvalues and modes of the Koopman operator capturethe evolution of data measuring the nonlinear dynamical system(Budisic et al., 2012; Mezic, 2005). DMD is an approximation ofKoopman spectral analysis (Rowley et al., 2009), so that DMDmodes are able to describe even nonlinear systems.

As an algorithm, it is convenient to think of spatial–temporaldecomposition by DMD as a hybrid of static mode extraction byprincipal components analysis (PCA) in the spatial domain and dis-crete Fourier transform (DFT) in the time domain. In fact, DMDmodes are a rotation of PCA space such that each basis vector hascoherent dynamics. The DMD algorithm in Section 2.1 starts witha SVD of the data matrix X = U!V* as the first step, where U areidentical to PCA modes. DMD modes are eigenvectors of A = U∗AU,so that we can think of A as the correlation between PCA modes Uand PCA modes in one time step AU. Liked the DFT, DMD extractsfrequencies of oscillations observed in the measurements. In addi-tion, DMD goes beyond DFT to also estimate rates of growth/decay,where the DFT eigenvalues always have magnitudes of exactly one.

The general formulation of the high-dimensional timeseriesproblem is related to several methods in the statistics literature,including vector autoregression (VAR, Charemza and Deadman(1992)). DMD differs from VAR in that the A matrix in Eq. (2) isnever explicitly estimated, but rather we seek its eigendecomposi-tion by computing A in step 2 of the DMD algorithm. The resultantmodes are interpreted as a low-rank dynamical system expressedin Eq. (5). Further, these modes represent separable spatiotemporalfeatures of the data. Interestingly, this approach of computing A ismathematically related to Principal Components Regression (PCR,Jolliffe (2005)).

2.1.4. Additional properties and practical limitationsA few general properties of DMD are interesting to note. The

data X may be real or complex valued; in the case of recordingsfrom electrode arrays, we will proceed assuming X are real val-ued measurements of voltage. Further, the decomposition is unique(Chen et al., 2012), and it is also possible to compute the DMD ofnon-uniformly sampled data (Tu et al., 2013).

The relationship of DMD to PCA and DFT points to a few lim-itations of the technique that guide its application. DMD spatial

4 B.W. Brunton et al. / Journal of Neuroscience Methods 258 (2016) 1–15

Fig. 1. A comparison of DMD with common modal decomposition algorithms on a synthetic timeseries dataset. (a) The dataset is a movie with 6400 pixels in each frame, andthis noisy, high-dimensional time series dataset has two underlying, overlapping patterns, a Gaussian oval and a square. Each mode also has a distinct temporal evolutionthat includes both growth/decay and oscillation. The magnitude of the noise is 0.75× the magnitude of the signals. (b) PCA derives modes that mix the underlying modes. (c)ICA modes more closely resemble the generative modes, but the two underlying modes are still mixed. (d) DMD extracts the spatial–temporal coherent modes in the movie.These DMD modes closely resemble the underlying spatial modes and provide an estimate of the temporal evolution of these patterns.

2.1.2. DMD, PCA and ICA on a synthetic datasetTo build some intuition of DMD modes, Fig. 1 compares results of

modal decomposition by PCA, independence components analysis(ICA, Hyvärinen and Oja (2000)), and DMD on a synthetic timeseriesdataset. The timeseries dataset is a movie 10 s in duration sampledat 50 frames per second, where each frame is a 80 × 80 = 6400 pixelimage (so n = 6400 and m = 500). The dataset is constructed to bethe sum of two generative modes, each with a spatial pattern thatevolves according to some coherent temporal dynamics (Fig. 1a).Mode 1 is a Gaussian oval in space that oscillates and decays in time;Mode 2 is a square that oscillates at a lower frequency than theoval does. The two modes are spatially overlapping. Each mode’smagnitude is of range [−1, 1], and independent noise drawn froma Gaussian distribution N(0, 0.75) was added at each pixel.

Fig. 1b–d shows the first two modes computed by each method.As shown in Fig. 1b, PCA derives modes that mix the two genera-tive modes in Fig. 1a. These PCA modes are vectors in Rn, ordered bytheir ability to explain the greatest fraction of variance in the data;PCA assumes the data is distributed as a multi-dimensional Gauss-ian. Fig. 1c shows that ICA can potentially do better than PCA, but thetwo generative modes are still mixed. ICA mode 1 (top of Fig. 1c)contains the Gaussian oval with a shadow of the square. UnlikePCA, ICA modes are computed assuming the underlying signals arenon-Gaussian and statistically independent.

In contrast, DMD is an explicitly temporal decompositionand takes the sequences of snapshots into account, derivingspatial–temporal coherent patterns in the movie. DMD modes areclosely related to PCA modes and also assumes variance in the datais Gaussian. The two largest DMD modes not only closely resem-ble the two generative modes, but they also contain an estimateof the temporal dynamics of the two modes, including an estimateof their frequencies of oscillation and time constant of exponentialgrowth/decay. These temporal parameters are computed from theDMD eigenvalues by Eq. (6) as explained in Section 2.4. Further,the computational complexity of DMD is within the same order ofmagnitude as that of PCA.

2.1.3. Connections to related methodsDMD has deep mathematical connections to Koopman spectral

analysis. The Koopman operator is an infinite-dimensional, linear

operator that represents finite-dimensional, nonlinear dynamics.The eigenvalues and modes of the Koopman operator capturethe evolution of data measuring the nonlinear dynamical system(Budisic et al., 2012; Mezic, 2005). DMD is an approximation ofKoopman spectral analysis (Rowley et al., 2009), so that DMDmodes are able to describe even nonlinear systems.

As an algorithm, it is convenient to think of spatial–temporaldecomposition by DMD as a hybrid of static mode extraction byprincipal components analysis (PCA) in the spatial domain and dis-crete Fourier transform (DFT) in the time domain. In fact, DMDmodes are a rotation of PCA space such that each basis vector hascoherent dynamics. The DMD algorithm in Section 2.1 starts witha SVD of the data matrix X = U!V* as the first step, where U areidentical to PCA modes. DMD modes are eigenvectors of A = U∗AU,so that we can think of A as the correlation between PCA modes Uand PCA modes in one time step AU. Liked the DFT, DMD extractsfrequencies of oscillations observed in the measurements. In addi-tion, DMD goes beyond DFT to also estimate rates of growth/decay,where the DFT eigenvalues always have magnitudes of exactly one.

The general formulation of the high-dimensional timeseriesproblem is related to several methods in the statistics literature,including vector autoregression (VAR, Charemza and Deadman(1992)). DMD differs from VAR in that the A matrix in Eq. (2) isnever explicitly estimated, but rather we seek its eigendecomposi-tion by computing A in step 2 of the DMD algorithm. The resultantmodes are interpreted as a low-rank dynamical system expressedin Eq. (5). Further, these modes represent separable spatiotemporalfeatures of the data. Interestingly, this approach of computing A ismathematically related to Principal Components Regression (PCR,Jolliffe (2005)).

2.1.4. Additional properties and practical limitationsA few general properties of DMD are interesting to note. The

data X may be real or complex valued; in the case of recordingsfrom electrode arrays, we will proceed assuming X are real val-ued measurements of voltage. Further, the decomposition is unique(Chen et al., 2012), and it is also possible to compute the DMD ofnon-uniformly sampled data (Tu et al., 2013).

The relationship of DMD to PCA and DFT points to a few lim-itations of the technique that guide its application. DMD spatial

……… … ⇡

ex.

K

ę ©¤~"b��[6]Ú*Ô˦��üė÷Ĕ��êĔñēöĎè5�=> Pw¡ýĚøÉãÛk=r]�ÛC�Ý�p �ę�ĐôđĒ?èpÇÑ©�ÙąĒďĚøG2eÛ5� => -�_�pè3n

ěgn�Û�Ĝ

K'j = �j'j

�rĐĚÿ��(Rowley+ 09, Schmid 10)

ę <UDMDĐýĔÜĞd2róõüĎè�2ÎÑX}ÞÖÆÒÑ=> tmrĐýĔè5�ÎvÃÛ`Z1�rLeè�pÏåÑßÛ*sè]yěAWÄ1Å(a)Ĝę �ÚýĚø�V×ÎÕÖÜÙËĞ��rÙ1�ÝpÇåX}Þè]y => AWÄ1Å(b)ę �rĐĚÿ���ÛÛ?�èÆÌåÑßÛāđĚĒĔĂúþèpÇÑX}ÞxàT78�Ú qě=> [5]ÙØĜ

HYBe×[4]è}Þ"çÐÕõċĚûěĄõðúþĊĚĔĜ�hýĚøÝ�pÎÑ�ġ

pÇÑýĚø�

!ćĕĚè2a�Ú)ß�Þ ��ÎÑ�ě�/¥:óđĚþA�/.IĜ

C�ÎÑk=èpÇÑóđĚþ $Û�¨z8ě�/¥:ìĚĂĔDMD/DMDĜ

5�ĐýĔġětmrDMDĜ

�|b��6

y0,t ⇠ CN⇣Pk

i=1't,iwi,�2I

y1,t ⇠ CN⇣Pk

i=1�i't,iwi,�2I

't,i ⇠ CN (0, 1)

ąĒďĚøÚ���6è�ÇÕĈëörG�à �

HYĐýĔěBDMDĜÛS£òėćĔÚâåG�~WÊM0DMDÚ��ÏåÉèt�ÎÑ[��ěStuart-LandauLu9ĝăëöĜ

k1 2 3 4

k1 2 3 4

k1 2 3 4

SparseBDMD

zn =

2

664

0 �5 0 02 �4 0 03 �3 0 04 0 0 0

3

775

2

664

0.9n

0.7n

00

3

775+ en

DMD

Ground TruthJ��ġ

i(&/ě´¹ÂĜ

i(&/ě¶½¾¼Ĝ

i(&/ěµÀ»¹Á½À¿Ĝ

©¤~"b��¦��üė÷Ĕ��Û29�

õąĚõ���6èpÇÑ,"ÛJ��

ýĚøoAÚpÇÑĐýĔ

¦��üė÷Ĕ��Û��¢J

'SÙùëĀčîõÝÛ��

g(xt+c) =qX

j=1

�cj'j(xj)vj

4 B.W. Brunton et al. / Journal of Neuroscience Methods 258 (2016) 1–15

Fig. 1. A comparison of DMD with common modal decomposition algorithms on a synthetic timeseries dataset. (a) The dataset is a movie with 6400 pixels in each frame, andthis noisy, high-dimensional time series dataset has two underlying, overlapping patterns, a Gaussian oval and a square. Each mode also has a distinct temporal evolutionthat includes both growth/decay and oscillation. The magnitude of the noise is 0.75× the magnitude of the signals. (b) PCA derives modes that mix the underlying modes. (c)ICA modes more closely resemble the generative modes, but the two underlying modes are still mixed. (d) DMD extracts the spatial–temporal coherent modes in the movie.These DMD modes closely resemble the underlying spatial modes and provide an estimate of the temporal evolution of these patterns.

2.1.2. DMD, PCA and ICA on a synthetic datasetTo build some intuition of DMD modes, Fig. 1 compares results of

modal decomposition by PCA, independence components analysis(ICA, Hyvärinen and Oja (2000)), and DMD on a synthetic timeseriesdataset. The timeseries dataset is a movie 10 s in duration sampledat 50 frames per second, where each frame is a 80 × 80 = 6400 pixelimage (so n = 6400 and m = 500). The dataset is constructed to bethe sum of two generative modes, each with a spatial pattern thatevolves according to some coherent temporal dynamics (Fig. 1a).Mode 1 is a Gaussian oval in space that oscillates and decays in time;Mode 2 is a square that oscillates at a lower frequency than theoval does. The two modes are spatially overlapping. Each mode’smagnitude is of range [−1, 1], and independent noise drawn froma Gaussian distribution N(0, 0.75) was added at each pixel.

Fig. 1b–d shows the first two modes computed by each method.As shown in Fig. 1b, PCA derives modes that mix the two genera-tive modes in Fig. 1a. These PCA modes are vectors in Rn, ordered bytheir ability to explain the greatest fraction of variance in the data;PCA assumes the data is distributed as a multi-dimensional Gauss-ian. Fig. 1c shows that ICA can potentially do better than PCA, but thetwo generative modes are still mixed. ICA mode 1 (top of Fig. 1c)contains the Gaussian oval with a shadow of the square. UnlikePCA, ICA modes are computed assuming the underlying signals arenon-Gaussian and statistically independent.

In contrast, DMD is an explicitly temporal decompositionand takes the sequences of snapshots into account, derivingspatial–temporal coherent patterns in the movie. DMD modes areclosely related to PCA modes and also assumes variance in the datais Gaussian. The two largest DMD modes not only closely resem-ble the two generative modes, but they also contain an estimateof the temporal dynamics of the two modes, including an estimateof their frequencies of oscillation and time constant of exponentialgrowth/decay. These temporal parameters are computed from theDMD eigenvalues by Eq. (6) as explained in Section 2.4. Further,the computational complexity of DMD is within the same order ofmagnitude as that of PCA.

2.1.3. Connections to related methodsDMD has deep mathematical connections to Koopman spectral

analysis. The Koopman operator is an infinite-dimensional, linear

operator that represents finite-dimensional, nonlinear dynamics.The eigenvalues and modes of the Koopman operator capturethe evolution of data measuring the nonlinear dynamical system(Budisic et al., 2012; Mezic, 2005). DMD is an approximation ofKoopman spectral analysis (Rowley et al., 2009), so that DMDmodes are able to describe even nonlinear systems.

As an algorithm, it is convenient to think of spatial–temporaldecomposition by DMD as a hybrid of static mode extraction byprincipal components analysis (PCA) in the spatial domain and dis-crete Fourier transform (DFT) in the time domain. In fact, DMDmodes are a rotation of PCA space such that each basis vector hascoherent dynamics. The DMD algorithm in Section 2.1 starts witha SVD of the data matrix X = U!V* as the first step, where U areidentical to PCA modes. DMD modes are eigenvectors of A = U∗AU,so that we can think of A as the correlation between PCA modes Uand PCA modes in one time step AU. Liked the DFT, DMD extractsfrequencies of oscillations observed in the measurements. In addi-tion, DMD goes beyond DFT to also estimate rates of growth/decay,where the DFT eigenvalues always have magnitudes of exactly one.

The general formulation of the high-dimensional timeseriesproblem is related to several methods in the statistics literature,including vector autoregression (VAR, Charemza and Deadman(1992)). DMD differs from VAR in that the A matrix in Eq. (2) isnever explicitly estimated, but rather we seek its eigendecomposi-tion by computing A in step 2 of the DMD algorithm. The resultantmodes are interpreted as a low-rank dynamical system expressedin Eq. (5). Further, these modes represent separable spatiotemporalfeatures of the data. Interestingly, this approach of computing A ismathematically related to Principal Components Regression (PCR,Jolliffe (2005)).

2.1.4. Additional properties and practical limitationsA few general properties of DMD are interesting to note. The

data X may be real or complex valued; in the case of recordingsfrom electrode arrays, we will proceed assuming X are real val-ued measurements of voltage. Further, the decomposition is unique(Chen et al., 2012), and it is also possible to compute the DMD ofnon-uniformly sampled data (Tu et al., 2013).

The relationship of DMD to PCA and DFT points to a few lim-itations of the technique that guide its application. DMD spatial

4 B.W. Brunton et al. / Journal of Neuroscience Methods 258 (2016) 1–15

Fig. 1. A comparison of DMD with common modal decomposition algorithms on a synthetic timeseries dataset. (a) The dataset is a movie with 6400 pixels in each frame, andthis noisy, high-dimensional time series dataset has two underlying, overlapping patterns, a Gaussian oval and a square. Each mode also has a distinct temporal evolutionthat includes both growth/decay and oscillation. The magnitude of the noise is 0.75× the magnitude of the signals. (b) PCA derives modes that mix the underlying modes. (c)ICA modes more closely resemble the generative modes, but the two underlying modes are still mixed. (d) DMD extracts the spatial–temporal coherent modes in the movie.These DMD modes closely resemble the underlying spatial modes and provide an estimate of the temporal evolution of these patterns.

2.1.2. DMD, PCA and ICA on a synthetic datasetTo build some intuition of DMD modes, Fig. 1 compares results of

modal decomposition by PCA, independence components analysis(ICA, Hyvärinen and Oja (2000)), and DMD on a synthetic timeseriesdataset. The timeseries dataset is a movie 10 s in duration sampledat 50 frames per second, where each frame is a 80 × 80 = 6400 pixelimage (so n = 6400 and m = 500). The dataset is constructed to bethe sum of two generative modes, each with a spatial pattern thatevolves according to some coherent temporal dynamics (Fig. 1a).Mode 1 is a Gaussian oval in space that oscillates and decays in time;Mode 2 is a square that oscillates at a lower frequency than theoval does. The two modes are spatially overlapping. Each mode’smagnitude is of range [−1, 1], and independent noise drawn froma Gaussian distribution N(0, 0.75) was added at each pixel.

Fig. 1b–d shows the first two modes computed by each method.As shown in Fig. 1b, PCA derives modes that mix the two genera-tive modes in Fig. 1a. These PCA modes are vectors in Rn, ordered bytheir ability to explain the greatest fraction of variance in the data;PCA assumes the data is distributed as a multi-dimensional Gauss-ian. Fig. 1c shows that ICA can potentially do better than PCA, but thetwo generative modes are still mixed. ICA mode 1 (top of Fig. 1c)contains the Gaussian oval with a shadow of the square. UnlikePCA, ICA modes are computed assuming the underlying signals arenon-Gaussian and statistically independent.

In contrast, DMD is an explicitly temporal decompositionand takes the sequences of snapshots into account, derivingspatial–temporal coherent patterns in the movie. DMD modes areclosely related to PCA modes and also assumes variance in the datais Gaussian. The two largest DMD modes not only closely resem-ble the two generative modes, but they also contain an estimateof the temporal dynamics of the two modes, including an estimateof their frequencies of oscillation and time constant of exponentialgrowth/decay. These temporal parameters are computed from theDMD eigenvalues by Eq. (6) as explained in Section 2.4. Further,the computational complexity of DMD is within the same order ofmagnitude as that of PCA.

2.1.3. Connections to related methodsDMD has deep mathematical connections to Koopman spectral

analysis. The Koopman operator is an infinite-dimensional, linear

operator that represents finite-dimensional, nonlinear dynamics.The eigenvalues and modes of the Koopman operator capturethe evolution of data measuring the nonlinear dynamical system(Budisic et al., 2012; Mezic, 2005). DMD is an approximation ofKoopman spectral analysis (Rowley et al., 2009), so that DMDmodes are able to describe even nonlinear systems.

As an algorithm, it is convenient to think of spatial–temporaldecomposition by DMD as a hybrid of static mode extraction byprincipal components analysis (PCA) in the spatial domain and dis-crete Fourier transform (DFT) in the time domain. In fact, DMDmodes are a rotation of PCA space such that each basis vector hascoherent dynamics. The DMD algorithm in Section 2.1 starts witha SVD of the data matrix X = U!V* as the first step, where U areidentical to PCA modes. DMD modes are eigenvectors of A = U∗AU,so that we can think of A as the correlation between PCA modes Uand PCA modes in one time step AU. Liked the DFT, DMD extractsfrequencies of oscillations observed in the measurements. In addi-tion, DMD goes beyond DFT to also estimate rates of growth/decay,where the DFT eigenvalues always have magnitudes of exactly one.

The general formulation of the high-dimensional timeseriesproblem is related to several methods in the statistics literature,including vector autoregression (VAR, Charemza and Deadman(1992)). DMD differs from VAR in that the A matrix in Eq. (2) isnever explicitly estimated, but rather we seek its eigendecomposi-tion by computing A in step 2 of the DMD algorithm. The resultantmodes are interpreted as a low-rank dynamical system expressedin Eq. (5). Further, these modes represent separable spatiotemporalfeatures of the data. Interestingly, this approach of computing A ismathematically related to Principal Components Regression (PCR,Jolliffe (2005)).

2.1.4. Additional properties and practical limitationsA few general properties of DMD are interesting to note. The

data X may be real or complex valued; in the case of recordingsfrom electrode arrays, we will proceed assuming X are real val-ued measurements of voltage. Further, the decomposition is unique(Chen et al., 2012), and it is also possible to compute the DMD ofnon-uniformly sampled data (Tu et al., 2013).

The relationship of DMD to PCA and DFT points to a few lim-itations of the technique that guide its application. DMD spatial

……… … ⇡

ěP¡L#Ĝ

ěw¡L#Ĝ

©a�ÛP{�ýĚø

ěP¡L#Ĝ

DMDĐĚÿ v1

�hę�h!ÃÛDMDĐĚÿÚ4>…

P¡rÙ�F?

©¤~"b��Úâä^ÃÙPw¡r��@+è�n

w¡rÙ�F?P¡rÙïĔĚć¢

w¡rÙïĔĚć¢

D1

D2

�rk?è�OÎÑ��ěìĚĂĔĜè5�

k(D1,D2) = PA(V1,V2)�Ĝ

�A��

¦�;Ù�rk?è�OÎÑ1�ě�¨ę�hĜè3n

Pw¡ýĚøěüė÷ĔĜ

i(&/ě��Ĝ