second-order networks in pytorch · (eeg/ecg) classi cation. in a similar vein, [4] and [2] extend...

HAL Id: hal-02290841https://hal.archives-ouvertes.fr/hal-02290841

Submitted on 18 Sep 2019

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Second-order networks in PyTorchDaniel Brooks, Olivier Schwander, Frédéric Barbaresco, Jean-Yves Schneider,

Matthieu Cord

To cite this version:Daniel Brooks, Olivier Schwander, Frédéric Barbaresco, Jean-Yves Schneider, Matthieu Cord. Second-order networks in PyTorch. GSI 2019 - 4th International Conference on Geometric Science of Infor-mation, Aug 2019, Toulouse, France. pp.751-758, �10.1007/978-3-030-26980-7_78�. �hal-02290841�

https://hal.archives-ouvertes.fr/hal-02290841

https://hal.archives-ouvertes.fr

Second-order networks in PyTorch

Daniel Brooks1,2, Olivier Schwander2, Frederic Barbaresco1, Jean-YvesSchneider1, and Matthieu Cord2

1 Thales Land and Air Systems, Advanced Radar Concepts (Limours, FRANCE)2 Sorbonne Universite, CNRS, LIP6 - Laboratoire d’Informatique de Paris 6, F-75005

(Paris, FRANCE)

Abstract. Classification of Symmetric Positive Definite (SPD) matricesis gaining momentum in a variety machine learning application fields. Inthis work we propose a Python library which implements neural networkson SPD matrices, based on the popular deep learning framework Pytorch.

Keywords: SPD matrix, covariance, second-order neural network, Rie-mannian machine learning

1 Introduction

Information geometry-based machine learning has recently been rapidly emerg-ing in a broad spectrum of learning scenarios, and deep learning has been noexception. Notably, works such as [14], [15] and [13] introduce neural networks re-spectively operating on Lie groups, Grassmann spaces, and SPD matrices. Tthenatural representation of any temporally or spatially structured signal as a Gaus-sian process allows for a near universal possible interpretation of the signal as itstemporal or spatial covariance, which is an SPD matrix, i.e. which belongs to theSPD Riemannian manifold, which we note S+∗ . Previous works make use of theSPD representation in other contexts than deep learning: for instance, Rieman-nian metric learning on S+∗ is developed in [24], while [23] review kernel methodson S+∗ , with a primary applicative focus on electro-encephalogram/cardiogram(EEG/ECG) classification. In a similar vein, [4] and [2] extend barycenter-basedclassification methods to the SPD Riemannian framework. On the other hand, [9]propose the usage of SPD matrices as a region descriptor in images, with ap-plications in image segmentation. The work in [17] pushed the idea further byallowing the region covariance descriptor to be appended to a deep neural rep-resentation of an image, and by doing so introduced the first hints of automaticbackpropagation in a Riemannian setting. Finally, the older theoretical develop-ments in [7] notably allowed the extension of optimization methods to manifold-valued neural networks as later utilized in [13], [10] and [8]. Even more recentworks, namely [1] and [25] have appended SPD neural networks to classical,Euclidean ones, by considering the second-order moments of the learnt featurerepresentations as a suitable representation for the data.

In this environment of popularization of deep learning on SPD matrices, wepropose torchspdnet, a Python library featuring many relevant modules necessary

2 D. Brooks et al.

to build a neural network operating on SPD matrices. We do so in the popularPyTorch framework [21]. While other libraries were proposed for general learn-ing on manifolds (Geomstats [20]), deep learning on manifolds (McTorch [19]),optimization on manifolds (Manopt [5]) and SPD matrix manipulation (PyRie-mann [3]), ours focusses exclusively on deep learning architectures for SPD ma-trices, providing seamless integration with any PyTorch development framework.In the following section we describe the core components of a SPD neural net-work, which we may call SPDNet. The third section deals with the optimizationof a manifold-valued network. Finally, we show some use cases.

2 Second order networks

Here we describe the architecture of an SPDNet. We begin with the core buildingblocks, then show how to build a network using these blocks in various scenarios.Following the logic of most modern deep learning frameworks including PyTorch,the core building blocks, or layers of the network, are implemented as individualmodules.

2.1 SPD layers

Similarly to a classical neural network, an SPDNet aims at building a hierarchicalsequence of more compact and discriminative manifolds as illustrated in figure 1.Three main layers are introduced in [13], described below.

TransformationsProjection

Fig. 1. Illustration of a generic SPD neural network. Successive bilinear layers followedby activations build a feature SPD manifold, which is then transformed to a Euclideanspace to allow for classification.

BiMap The bilinear mapping (BiMap) layer transforms an input matrix X(l−1)

of size n(l−1) at layer (l−1) into an SPD matrix X(l) of size n(l) at layer (l) usinga basis change matrix W (l), required to be full-rank, which in turn constrainsn(l) ≤ n(l−1). In practice W (l) is in fact constrained to be semi-orthogonal:

X(l) = W (l)TX(l−1)W (l) with W (l) ∈ O(n(l−1), n(l)) (1)

Second-order networks in PyTorch 3

In the equation above, O(n(l−1), n(l)) is the manifold of semi-orthogonal rect-

angular matrices, also called Stiefel manifold, and X(l−1) = U (l−1)Σ(l−1)U (l−1)T

designates the eigenvalue decomposition of X(l−1)

ReEig The transformation layer is followed by an activation, in this case arectified eigenvalues (ReEig) layer:

X(l) = U (l−1) max(Σ(l−1), εIn(l−1))U (l−1)T with P (l−1) = U (l−1)Σ(l−1)U (l−1)T

(2)The ReEig layer also makes use of an eigenvalue decomposition as it operates

directly on the eigenvalues, with ε being a fixed threshold set to a default valueof 1e− 4.

LogEig After a succession of transformations and activations, the final featuremanifold is then transformed via a logarithmic mapping to a Euclidean space(LogEig layer) to perform the actual classification:

X(l) = vec( U (l) log(Σ(l))U (l)T ) , with P (l) = U (l)Σ(l)U (l)T (3)

The LogEig layer is justified in the Log-Euclidian Metric (LEM) framework,independently introduced in [22] and [11], which shows a correspondence fromthe manifold S+∗ to the Euclidean space S+ of symmetric matrices through thematrix logarithm. The vec operator denotes matrix vectorization.

3 Training

The main difficulties of learning an SPDNet lie both in the backpropagationthrough structured Riemannian functions [16] [6], and in the manifold-constrainedoptimization [7].

3.1 Structured derivatives

Manifold-valued functions, such as the LogEig and ReEig layers, require a gen-eralization of the chain rule, key to the backpropgation algorithm. Both theselayers can be represented in a unified fashion as a non-linear function f acting

directly on the eigenvalues of the input matrix X(l−1) = U (l−1)Σ(l−1)U (l−1)T .

Then, the backpropagation goes as follows: given the succeeding gradient ∂L(l)

∂X(l) ,

the output gradient ∂L(l−1)

∂X(l−1) is:

∂L(l−1)

∂X(l−1) = U

(L� (UT (

∂L(l)

∂X(l))U)

)UT (4)

In the previous equation, the Loewner matrix of finite differences L is definedas:

Lij =

{f(σi)−f(σj)

σi−σjif σi 6= σj

f ′(σi) otherwise(5)

4 D. Brooks et al.

3.2 Constrained optimization

In the specific case of the BiMap layer, the transformation matrix W is con-strained to the Stiefel manifold. The Euclidean gradient ∂L

∂G of the loss functionL does not respect the geometry of the manifold: as such the gradient descentis ill-defined. ∂L

∂G . The correct Riemannian gradient is obtained by tangent pro-jection ΠTW on the manifold at W . The update is then obtained by computingthe geodesic on the manifold from W towards the Riemannian gradient, alsocalled exponential mapping ExpW (X). We illustrate this process in figure 2.Both the tangent projection and geodsic are known on the Stiefel manifold [7]:ction ExpW have a closed form [7]:

Fig. 2. Illustration of manifold-constrained gradient update. The Euclidean gradientis projected to the tangent space, then mapped to the manifold.

ΠTW (X) = X −WWTX

ExpW (X) = Orth(W +X)(6)

The operator Orth represents the orthonormalization of a free family of vec-tors, i.e. the Q matrix in the QR decomposition.

3.3 Summary

The library we propose seamlessly integrates orthogonally-constrained optimiza-tion on S+∗ : the code for setting up the learning of a model in PyTorch is onlymodified in the usage of the MixOptimizer class, which mixes a conventionaloptimizer with the Riemannian ones:


import torch . nn as nnfrom mixoptimizer import MixOptimizer. . .model = . . . #de f i n e the model. . .l=nn . CrossEntropyLoss ( )opt=MixOptimizer ( model . parameters ( ) , l r=l r , momentum=0.9 , weight decay=5e−4) #de f i n e the l o s s f unc t i on and mixed op t imi ze r. . .l . backward ( )opt . s tep ( ) #in the t r a i n i n g loop , compute g r ad i en t s and update we i gh t s as u s u a l l y done

4 Use cases

Here we show how to use the library in practice. Following the PyTorch logic, el-ementary functions are defined in torchspdnet.functional and high-level modulesin torchspdnet.nn.

4.1 Basic SPDNet model

Here we give the most basic use case scenario: given input covariance data ofsize 20 × 20, we build an SPDNet which reduces its size to 15 then 10 throughtwo BiMaps and a ReEig activation, followed by the LogEig and vectorization.Finally, a standard fully-connected layer allows for classification over the 3 classes

import torch . nn as nnimport torchspdnet . nn as nn spd

model=nn . Sequent i a l (nn spd . BiMap (1 , 1 , 20 , 15 ) ,nn spd . ReEig ( ) ,nn spd . BiMap (1 , 1 , 15 , 10 ) ,nn spd . LogEig ( ) ,nn spd . Vec to r i z e ( ) ,nn . Linear (10∗∗2 ,3)

)

Note that our implementation of the BiMap module supports an arbitrarynumber of channels, represented by the additional parameters all set to 1 in thisexample.

4.2 First-order and second-order combined

In a more complex example, an SPDNet acts upon the features maps of a convo-lutional network. For an image recognition task, these features may come from a

6 D. Brooks et al.

pre-trained deep network but nothing keeps from training the whole network inan end-to-end fashion or to fine-tune the parameters. Here we describe the com-bination of a pre-trained ResNet-18 [12] on the CIFAR10 [18] challenge and ofSPDNet layers. We call such a model a second-order neural network (SOCNN).

import torch . nn as nnimport torchspdnet . nn as nn spdfrom r e s n e t import ResNet18

class SOCNN(nn . Module ) :def i n i t ( s e l f ) :

super ( c l a s s , s e l f ) . i n i t ( )

s e l f . model fo=ResNet18 ( ) #f i r s t −order models e l f . model fo . l o a d s t a t e d i c t ( th . load ( ’ p r e t r a in ed /ResNet18 . pth ’ ) [ ’ s t a t e d i c t ’ ] )

s e l f . connect ion=nn . Conv2d (512 ,256 , k e r n e l s i z e =(1 ,1)) #convo l u t i ona l connect ion

s e l f . model so=nn . Sequent i a l ( #second−order modelnn spd . BiMap (1 ,1 , 256 ,128 ) ,nn spd . ReEig ( ) ,nn spd . BiMap (1 , 1 , 128 , 64 ) ,

) . to ( s e l f . d e v i c e s o )

s e l f . dense=nn . Sequent i a l (nn . Linear (64∗∗2 ,1024) ,nn . Linear (1024 ,10)

)

def forward ( s e l f , x ) :x f o=s e l f . model fo ( x )x co=s e l f . connect ion ( x f o )x sym=nn spd . CovPool ( ) ( x co . view ( x co . shape [ 0 ] , x co . shape [1 ] , −1) )x so=s e l f . model so ( x sym )x vec=nn spd . LogEig ( ) ( x so ) . view ( x so . shape [ 0 ] , x so . shape [−1]∗∗2)y=s e l f . dense ( x vec )return y

5 Conclusion

We have proposed a PyTorch library for deep learning on SPD matrices. Wehope its versatility and natural integration in any PyTorch workflow will allowfuture projects to more readily make use of the potential of exploiting covariancestructure in data at any level.


References

1. Acharya, D., Huang, Z., Paudel, D.P., Gool, L.V.: Covariance Pooling for FacialExpression Recognition p. 8

2. Barachant, A., Bonnet, S., Congedo, M., Jutten, C.: Multi-class BrainComputer Interface Classification by Riemannian Ge-ometry. IEEE Transactions on Biomedical Engineering 59(4),920–928 (Apr 2012). https://doi.org/10.1109/TBME.2011.2172210,http://ieeexplore.ieee.org/document/6046114/

3. barachant, a.: Python package for covariance matrices manipulation and Biosignalclassification with application in Brain Computer interface: alexandrebarachan-t/pyRiemann (Feb 2019), https://github.com/alexandrebarachant/pyRiemann,original-date: 2015-04-19T16:01:44Z

4. Barachant, A., Bonnet, S., Congedo, M., Jutten, C.: Classification of covari-ance matrices using a Riemannian-based kernel for BCI applications. Neurocom-puting 112, 172–178 (Jul 2013). https://doi.org/10.1016/j.neucom.2012.12.039,https://hal.archives-ouvertes.fr/hal-00820475

5. Boumal, N., Mishra, B., Absil, P.A., Sepulchre, R.: Manopt, a Matlab Toolbox forOptimization on Manifolds. Journal of Machine Learning Research 15, 1455–1459(2014), http://jmlr.org/papers/v15/boumal14a.html

6. Brodski, M., Dalecki, J., dus, O., Iohvidov, I., Kren, M., Ladyen-skaja, O., Lidski, V., Ljubi, J., Macaev, V., Povzner, A., Sahnovi,L., muljan, J., Suharevski, I., Uralceva, N.: Thirteen Papers on Func-tional Analysis and Partial Differential Equations, American Mathemat-ical Society Translations: Series 2, vol. 47. American Mathematical So-ciety (Dec 1965). https://doi.org/http://dx.doi.org/10.1090/trans2/047,http://www.ams.org/home/page/

7. Edelman, A., Arias, T., Smith, S.: The Geometry of Algorithms with Or-thogonality Constraints. SIAM Journal on Matrix Analysis and Applica-tions 20(2), 303–353 (Jan 1998). https://doi.org/10.1137/S0895479895290954,https://epubs.siam.org/doi/abs/10.1137/S0895479895290954

8. Engin, M., Wang, L., Zhou, L., Liu, X.: DeepKSPD: Learning Kernel-matrix-basedSPD Representation for Fine-grained Image Recognition. arXiv:1711.04047 [cs](Nov 2017), http://arxiv.org/abs/1711.04047, arXiv: 1711.04047

9. Faulkner, H., Shehu, E., Szpak, Z.L., Chojnacki, W., Tapamo, J.R., Dick, A.,Hengel, A.v.d.: A Study of the Region Covariance Descriptor: Impact of FeatureSelection and Image Transformations. In: 2015 International Conference on DigitalImage Computing: Techniques and Applications (DICTA). pp. 1–8 (Nov 2015).https://doi.org/10.1109/DICTA.2015.7371222

10. Gao, Z., Wu, Y., Bu, X., Jia, Y.: Learning a Robust Representation via a DeepNetwork on Symmetric Positive Definite Manifolds. arXiv:1711.06540 [cs] (Nov2017), http://arxiv.org/abs/1711.06540, arXiv: 1711.06540

11. Harris, W.F.: The average eye. Ophthalmic and Physiological Optics24(6), 580–585 (2004). https://doi.org/10.1111/j.1475-1313.2004.00239.x,https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1475-1313.2004.00239.x

12. He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning forImage Recognition. In: 2016 IEEE Conference on Computer Vi-sion and Pattern Recognition (CVPR). pp. 770–778. IEEE, Las Ve-gas, NV, USA (Jun 2016). https://doi.org/10.1109/CVPR.2016.90,http://ieeexplore.ieee.org/document/7780459/

8 D. Brooks et al.

13. Huang, Z., Van Gool, L.J.: A Riemannian Network for SPD Matrix Learning. In:AAAI. vol. 1, p. 3 (2017)

14. Huang, Z., Wan, C., Probst, T., Van Gool, L.: Deep Learning on LieGroups for Skeleton-based Action Recognition. arXiv:1612.05877 [cs] (Dec 2016),http://arxiv.org/abs/1612.05877, arXiv: 1612.05877

15. Huang, Z., Wu, J., Van Gool, L.: Building Deep Networks on Grassmann Mani-folds. arXiv:1611.05742 [cs] (Nov 2016), http://arxiv.org/abs/1611.05742, arXiv:1611.05742

16. Ionescu, C., Vantzos, O., Sminchisescu, C.: Matrix Backpropagationfor Deep Networks with Structured Layers. In: 2015 IEEE Interna-tional Conference on Computer Vision (ICCV). pp. 2965–2973. IEEE,Santiago, Chile (Dec 2015). https://doi.org/10.1109/ICCV.2015.339,http://ieeexplore.ieee.org/document/7410696/

17. Ionescu, C., Vantzos, O., Sminchisescu, C.: Training Deep Networks with Struc-tured Layers by Matrix Backpropagation. arXiv:1509.07838 [cs] (Sep 2015),http://arxiv.org/abs/1509.07838, arXiv: 1509.07838

18. Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images p. 6019. Meghwanshi, M., Jawanpuria, P., Kunchukuttan, A., Kasai, H., Mishra, B.: Mc-

Torch, a manifold optimization library for deep learning. arXiv:1810.01811 [cs, stat](Oct 2018), http://arxiv.org/abs/1810.01811, arXiv: 1810.01811

20. Miolane, N., Mathe, J., Donnat, C., Jorda, M., Pennec, X.: geomstats: a PythonPackage for Riemannian Geometry in Machine Learning. arXiv:1805.08308 [cs,stat] (May 2018), http://arxiv.org/abs/1805.08308, arXiv: 1805.08308

21. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z.,Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch (Oct2017), https://openreview.net/forum?id=BJJsrmfCZ

22. Pennec, X., Fillard, P., Ayache, N.: A Riemannian Frameworkfor Tensor Computing. International Journal of Computer Vision66(1), 41–66 (Jan 2006). https://doi.org/10.1007/s11263-005-3222-z,http://link.springer.com/10.1007/s11263-005-3222-z

23. Yger, F.: A review of kernels on covariance matrices for BCI applications. In: 2013IEEE International Workshop on Machine Learning for Signal Processing (MLSP).pp. 1–6 (Sep 2013). https://doi.org/10.1109/MLSP.2013.6661972

24. Yger, F., Sugiyama, M.: Supervised LogEuclidean Metric Learning forSymmetric Positive Definite Matrices. arXiv:1502.03505 [cs] (Feb 2015),http://arxiv.org/abs/1502.03505, arXiv: 1502.03505

25. Yu, K., Salzmann, M.: Second-order Convolutional Neural Networks.arXiv:1703.06817 [cs] (Mar 2017), http://arxiv.org/abs/1703.06817, arXiv:1703.06817

second-order networks in pytorch · (eeg/ecg) classi cation. in a similar vein, [4] and [2] extend...

Documents