shallowneuralnetworksfor royalsocietypublishing.org/journal/rspa ...mmahoney/pubs/rspa.2020... ·...

25
royalsocietypublishing.org/journal/rspa Research Cite this article: Erichson NB, Mathelin L, Yao Z, Brunton SL, Mahoney MW, Kutz JN. 2020 Shallow neural networks for fluid flow reconstruction with limited sensors. Proc. R. Soc. A 476: 20200097. http://dx.doi.org/10.1098/rspa.2020.0097 Received: 14 February 2020 Accepted: 13 May 2020 Subject Areas: artificial intelligence, computational physics, mechanical engineering Keywords: neural networks, sensors, flow field estimation, fluid dynamics, machine learning Author for correspondence: N. Benjamin Erichson e-mail: [email protected] Shallow neural networks for fluid flow reconstruction with limited sensors N. Benjamin Erichson 1 , Lionel Mathelin 3 , Zhewei Yao 2 , Steven L. Brunton 4 , Michael W. Mahoney 1 and J. Nathan Kutz 5 1 ICSI and Department of Statistics, and 2 Department of Mathematics, University of California, Berkeley, Berkeley, CA, USA 3 Université Paris-Saclay, CNRS, LIMSI, 91400 Orsay, France 4 Department of Mechanical Engineering, and 5 Department of Applied Mathematics, University of Washington, Seattle, WA, USA NBE, 0000-0003-0667-3516 In many applications, it is important to reconstruct a fluid flow field, or some other high-dimensional state, from limited measurements and limited data. In this work, we propose a shallow neural network- based learning methodology for such fluid flow reconstruction. Our approach learns an end-to- end mapping between the sensor measurements and the high-dimensional fluid flow field, without any heavy preprocessing on the raw data. No prior knowledge is assumed to be available, and the estimation method is purely data-driven. We demonstrate the performance on three examples in fluid mechanics and oceanography, showing that this modern data-driven approach outperforms traditional modal approximation techniques which are commonly used for flow reconstruction. Not only does the proposed method show superior performance characteristics, it can also produce a comparable level of performance to traditional methods in the area, using significantly fewer sensors. Thus, the mathematical architecture is ideal for emerging global monitoring technologies where measurement data are often limited. 1. Introduction The ability to reconstruct coherent flow features from limited observation can be critically enabling for 2020 The Author(s) Published by the Royal Society. All rights reserved.

Upload: others

Post on 27-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • royalsocietypublishing.org/journal/rspa

    ResearchCite this article: Erichson NB, Mathelin L, YaoZ, Brunton SL, Mahoney MW, Kutz JN. 2020Shallow neural networks for fluid flowreconstruction with limited sensors. Proc. R.Soc. A 476: 20200097.http://dx.doi.org/10.1098/rspa.2020.0097

    Received: 14 February 2020Accepted: 13 May 2020

    Subject Areas:artificial intelligence, computational physics,mechanical engineering

    Keywords:neural networks, sensors, flow fieldestimation, fluid dynamics, machine learning

    Author for correspondence:N. Benjamin Erichsone-mail: [email protected]

    Shallow neural networks forfluid flow reconstruction withlimited sensorsN. Benjamin Erichson1, Lionel Mathelin3, Zhewei

    Yao2, Steven L. Brunton4, Michael W. Mahoney1 and

    J. Nathan Kutz5

    1ICSI and Department of Statistics, and 2Department ofMathematics, University of California, Berkeley, Berkeley, CA, USA3Université Paris-Saclay, CNRS, LIMSI, 91400 Orsay, France4Department of Mechanical Engineering, and 5Department ofApplied Mathematics, University of Washington, Seattle, WA, USA

    NBE, 0000-0003-0667-3516

    In many applications, it is important to reconstructa fluid flow field, or some other high-dimensionalstate, from limited measurements and limited data.In this work, we propose a shallow neural network-based learning methodology for such fluid flowreconstruction. Our approach learns an end-to-end mapping between the sensor measurementsand the high-dimensional fluid flow field, withoutany heavy preprocessing on the raw data. Noprior knowledge is assumed to be available, andthe estimation method is purely data-driven. Wedemonstrate the performance on three examplesin fluid mechanics and oceanography, showingthat this modern data-driven approach outperformstraditional modal approximation techniques whichare commonly used for flow reconstruction. Notonly does the proposed method show superiorperformance characteristics, it can also producea comparable level of performance to traditionalmethods in the area, using significantly fewersensors. Thus, the mathematical architecture is idealfor emerging global monitoring technologies wheremeasurement data are often limited.

    1. IntroductionThe ability to reconstruct coherent flow features fromlimited observation can be critically enabling for

    2020 The Author(s) Published by the Royal Society. All rights reserved.

    http://crossmark.crossref.org/dialog/?doi=10.1098/rspa.2020.0097&domain=pdf&date_stamp=2020-06-10mailto:[email protected]://orcid.org/0000-0003-0667-3516

  • 2

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    applications across the physical and engineering sciences [1–5]. For example, efficient andaccurate fluid flow estimation is critical for active flow control, and it may help to craft morefuel-efficient automobiles as well as high-efficiency turbines. The ability to reconstruct importantfluid flow features from limited observation is also central in applications as diverse as cardiacbloodflow modelling and climate science [6]. All of these applications rely on estimating thestructure of fluid flows based on limited sensor measurements.

    More concretely, the objective is to estimate the flow field x ∈ Rm from sensor measurementss ∈ Rp, that is, to learn the relationship s �→ x. The restriction of limited sensors gives p � m. Thesensor measurements s are collected via a sampling process from the high-dimensional field x.We can describe this process as

    s = H(x), (1.1)where H : Rm → Rp denotes a measurement operator. Now, the task of flow reconstructionrequires the construction of an inverse model that produces the field x in response to theobservations s, which we may describe as

    x = G(s), (1.2)where G : Rp → Rm denotes a nonlinear forward operator. However, the measurement operatorH may be unknown or highly nonlinear in practice. Hence, the problem is often ill-posed, and wecannot directly invert the measurement operator H to obtain the forward operator G.

    Fortunately, given a set of training examples {xi, si}i, we may learn a function F to approximatethe forward operator G. Specifically, we aim to learn a function F : s �→ x̂, which maps a limitednumber of measurements to the estimated state x̂,

    x̂ =F (s) , (1.3)so that the misfit is small, e.g. in a Euclidean sense over all sensor measurements

    ‖F (s) − G(s)‖22 < �,where � is a small positive number. Neural network (NN)-based inversion is common practicein machine learning [7], dating back to the late 1980s [8]. This powerful learning paradigm isalso increasingly used for flow reconstruction [9–11], prediction [12–16] and simulations [17]. Inparticular, deep inverse transform learning is an emerging concept [18–21], which has been shownto outperform traditional methods in applications such as denoising, deconvolution and super-resolution.

    Here, we explore shallow neural networks (SNNs) to learn the input-to-output mappingbetween the sensor measurements and the flow field. Figure 1 shows a design sketch for theproposed framework for fluid flow reconstruction. We can express the network architecture(henceforth called SHALLOW DECODER (SD)) more concisely as follows:

    s �→ first hidden layer �→ second hidden layer �→ output layer �→ x̂.SNNs are considered to be networks with very few hidden layers. We favour shallow overdeep architectures, because the simplicity of SNNs allows faster training, less tuning and easierinterpretation (and also since it works, and thus there is no need to consider deeper architectures).

    There are several advantages of this mathematical approach over traditional scientificcomputing methods for fluid flow reconstruction [4,22–25]. First, the SD considered here featuresa linear last layer and provides a supervised joint learning framework for the low-dimensionalapproximation space of the flow field and the map from the measurements to this low-dimensional space. This allows the approximation basis to be tailored not only to the state spacebut also to the associated measurements, preventing observability issues. In contrast, these twosteps are disconnected in standard methods (discussed in more detail in §2). Second, the methodallows for flexibility in the measurements, which do not necessarily have to be linearly related tothe state, as in many standard methods. Finally, the SD network produces interpretable featuresof the dynamics, potentially improving on classical proper orthogonal decomposition (POD), also

  • 3

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    SD

    sensor measurements

    limited sensor measurements reconstructed flow fieldinput layerfirst hidden layersecond hidden layeroutput layer

    Figure 1. Illustration of our SD, which maps a few sensor measurements s ∈ R5 to the estimated field x̂ ∈ R78,406. In otherwords, this NN-based learningmethodology provides an end-to-endmapping between the sensormeasurements and the fluidflow field. (Online version in colour.)

    snap

    shot

    sSD

    (a)

    (b)

    Figure 2. Dominant modes learned by the SD in contrast to the POD modes. These dominant features show that the SDconstructs a reasonable characterization of the flow behind a cylinder. Indeed, by not constraining the modes to be linear andorthogonal, as is enforcedwith POD, a potentially more interpretable feature space can be extracted from data. Suchmodes canbe exploited for reconstruction of the state space from limitedmeasurements and limited data. (a) Modes of POD and (b)modesof the learned output layer of the SD. (Online version in colour.)

    known as principal component analysis, low-rank features. For instance, figure 2 shows thatthe basis learned via an SNN exhibits elements resembling physically consistent quantities, incontrast to alternative POD-based modal approximation methods that enforce orthogonality. Theinterpretation of the last (linear) layer is as follows: a given mode is constituted by the value ofeach spatially localized weight connecting the associated given node in the last hidden layer tonodes of the output layer.

    Limitations of our approach are standard to data-driven methods, in that the training datashould be as representative as possible of the system, in the sense that they should comprisesamples drawn from the same statistical distribution as the testing data.

    The paper is organized as follows. Section 2 discusses traditional modal approximationtechniques. Then, in §3, the specific implementation and architecture of our SD is described.Results are presented in §4 for various applications of interest. We aim to reconstruct (i) thevorticity field of a flow behind a cylinder from a handful of sensors on the cylinder surface, (ii) themean sea surface temperature (SST) from weekly SSTs for the last 26 years, and (iii) the velocityfield of a turbulent isotropic flow. We show that a very small number of sensor measurementsis indeed sufficient for flow reconstruction in these applications. Further, we show that theSD can handle nonlinear measurements and is robust to measurement noise. The results showsignificantly improved performance compared with traditional modal approximation techniques.

  • 4

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    The paper concludes in §5 with a discussion and outlook of the use of SNNs for more general flowfield reconstructions.

    2. Background on high-dimensional state estimationThe task of reconstructing from a limited number of measurements to the high-dimensional statespace is made possible by the fact that the dynamics for many complex systems, or datasets,exhibit some sort of low-dimensional structure. This fact has been exploited for state estimationusing (i) a tailored basis, such as POD, or (ii) a general basis in which the signal is sparse, e.g.typically a Fourier or wavelet basis will suffice. In the former, gappy POD methods [26] havebeen developed for principled reconstruction strategies [4,22–25]. In the latter, compressive sensingmethods [27–29] serve as a principled technique for reconstruction. Both techniques exploitthe fact that there exists a basis in which the high-dimensional state vector has a sparse, orcompressible, representation. In [30], a basis is learned such that it leads to a sparse approximationof the high-dimensional state while enforcing observability from the sensors.

    Next, we describe standard techniques for the estimation of a state x from observations s, andwe discuss observability issues. Established techniques for state reconstruction are based on theidea that a field x can be expressed in terms of a rank-k approximation

    x ≈ x̂ =k∑

    j=1φjνj = Φν, (2.1)

    where {φj}j are the modes of the approximation and {νj}j are the associated coefficients. Theapproximation space is derived from a given training set using unsupervised learning techniques.A typical approach to determine the approximation modes is POD [4,22,23,31]. Randomizedmethods for linear algebra enable the fast computation of such approximation modes [32–37].Given the approximation modes Φ, estimating the state x reduces to determining the coefficientsν from the sensor measurements s using supervised techniques. These typically aim to find theminimum-energy or minimum-norm solution that is consistent in a least-squares sense with themeasured data.

    (a) Standard approach: estimation via POD-based methodsTwo POD-based methods are discussed, which we will refer to as POD and POD PLUS in thefollowing. Both approaches reconstruct the state with POD modes, by estimating the coefficientsfrom sensor information. The POD modes Φ are obtained via the singular value decompositionof the mean centred training set X = (x1 . . . xn), with typically n ≤ m,

    X = UΣVT, (2.2)

    where the columns of U ∈ Rm×n are the left singular vectors and the columns of V ∈ Rn×n are theright singular vectors. The corresponding singular values are the diagonal elements of Σ ∈ Rn×n.Now, we define the approximation modes as Φ := Uk, by selecting k left singular vectors, withk ≤ p. Typically, we select the dominant k singular vectors as approximation modes; however,there are exceptions to this rule, as discussed below.

    (i) Standard POD-based method

    Let a linear measurement operator H : Rm → Rp describe the relationship between the field andthe associated observations, s = Hx. The approximation of the field x with the approximation

  • 5

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    modes {φj}j is obtained by solving the following equation for ν ∈ Rn:s = Hx ≈ HΦν. (2.3)

    A standard approach is to simply solve the following least-squares problem:

    ν ∈ arg minν̃

    ‖s − HΦν̃‖22 . (2.4)

    The solution with the minimum L2-norm is given by

    ν = (HΦ)+s, (2.5)with the superscript + denoting the Moore–Penrose pseudo-inverse. In this situation, the high-dimensional state is then estimated as

    x ≈ x̂ = Φν. (2.6)This approach is hereafter referred to as POD and has been used in previous efforts (e.g. [38,39]).

    With a nonlinear measurement operator H, the problem formulates similarly as a nonlinearleast-squares problem,

    ν ∈ arg minν̃

    ‖s − H(Φν̃)‖22 . (2.7)

    In this case, no closed-form solution is available in general and a nonlinear optimization problemmust be solved, whose computational burden limits the online (real-time) field reconstructioncapability. Further, the solution of the, often ill-posed, problem is not necessarily unique and doesnot allow for a reliable estimate. In contrast, the SD is trained end-to-end and essentially learns toassociate measurements to the right solution (see §3 for details).

    (ii) Improved POD-based method

    The standard POD-based method has several shortcomings. First, the least-squares problemformulated in equation (2.4) can be underspecified. Thus, it is favourable to introduce some bias inorder to reduce the variance by means of regularization. Ridge regularization is the most popularregularization technique for reducing the variance of the estimator,

    ν ∈ arg minν̃

    ‖s − HΦν̃‖22 + α ‖ν̃‖22 , (2.8)

    where α > 0 is the penalization parameter. Typically, this parameter is determined by k-fold cross-validation. An alternative approach to reduce the variance is to select a subset of the POD modes,i.e. only a few of the estimated coefficients are non-zero. The so-called least absolute shrinkageand selection operator (LASSO) for least-squares [40,41] can be formulated as

    ν ∈ arg minν̃

    ‖s − HΦν̃‖22 + β ‖ν̃‖1 , (2.9)

    where β > 0 controls the amount of sparsity. One can also combine both LASSO and ridgeregularization, resulting in the so-called ElasticNet [41,42] regularizer,

    ν ∈ arg minν̃

    ‖s − HΦν̃‖22 + α ‖ν̃‖22 + β ‖ν̃‖1 . (2.10)

    This regularization scheme often shows an improved predictive performance in practice;however, it requires that the user fiddles around with two tuning parameters α and β.

    Yet another approach is to use a shrinkage estimator that only retains the high-variance PODmodes, i.e. an estimator that selects a subset of all the POD modes that is used for solving theleast-squares problem. More concretely, we formulate the following constrained problem:

    ν ∈ arg minν̃

    ‖s − HΦν̃‖22 s.t. Φ(n−k)ν̃ = 0, (2.11)

    where Φ(n−k) = {φk+1, . . . ,φn}. Here, k ≤ n refers to the number of selected POD modes, reorderedwith indices {1, 2, . . . , k}. This hard-threshold regularizer constrains the solution to the column

  • 6

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    space of the selected POD modes and is also known as principal component regression [41]. Incontrast to the smooth shrinkage effect of ridge regularization, the hard-threshold regularizerhas a discrete shrinkage effect that nullifies the contributions of some of the low-variance modescompletely. However, based on our experiments, both ridge regression and the hard-thresholdshrinkage estimator perform on par for the task of flow field reconstruction. This said, theElasticNet regularizer might lead to a better predictive accuracy, since it can select the POD modesthat are most useful for prediction, rather than only selecting the high-variance POD modes. It isknown that the POD modes with low variances may also be important for predictive tasks [43,44]and could help to further improve the performance of the POD-based methods.

    Another shortcoming of the POD-based approach is that it requires explicit knowledge ofthe observation operator H and is subjected to ill-conditioning of the least-squares problem.These limitations render this ‘vanilla-flavoured’ approach often impractical in many situations,and they motivate an alternative formulation. The idea is to learn the map between coefficientsand observations without explicitly referring to H. It can be implicitly described by a, possiblynonlinear, operator P : Rk → Rp that is typically determined offline by minimizing the Bayes risk,defined as the misfit in the L2-sense,

    P ∈ arg minP̃

    Eμs,ν

    [∥∥∥s − P̃ν∥∥∥22

    ], (2.12)

    where μs,ν is the joint probability measure of the observations s and the coefficients ν obtainedby projecting the field onto the (orthonormal) POD modes, ν = ΦTx. This step only relies oninformation from the training set and is thus performed offline.

    We assume that the training set is representative of the underlying system, in the sense thatit should contain independent samples drawn from the stationary distribution of the physicalsystem at hand. The Bayes risk is then approximated by an empirical estimate, and the operatorP is determined as

    P ∈ arg minP̃

    n∑i=1

    ∥∥∥si − P̃νi∥∥∥22. (2.13)

    When the measurement operator H is linear, P is then an empirical estimate of HΦ, thecontribution of the basis modes {φj}j to the measurements s. This formulation was alreadyconsidered in our previous work (e.g. [30]), and brings flexibility in the properties of the map Pcompared with the closed-form solution in equation (2.5). For instance, regularization by sparsitycan be enforced in P, via L0- or L1-penalization. Expressing equation (2.13) in matrix form yields

    P ∈ arg minP̃∈Rp×k

    ∥∥∥S − P̃N∥∥∥2F, (2.14)

    where S ∈ Rp×n and N ∈ Rk×n, respectively, refer to the training data measurements {si}i andcoefficients {νi}i. It immediately follows that

    P = SN+ = S (Φ+X)+ = SVΣ+, (2.15)and the online approximation obtained by POD PLUS is finally given by the solution to thefollowing least-squares problem:

    ν ∈ arg minν̃

    ‖s − Pν̃‖22 . (2.16)

    However, ν ∈ Rk is typically higher dimensional than s ∈ Rp, and thus the problem is ill-posed. Wethen make use of the popular Tikhonov regularization, selecting the solution with the minimumL2-norm. This results in a ridge regression problem formulated as

    ν ∈ arg minν̃

    ‖s − Pν̃‖22 + λ ‖ν̃‖22 , (2.17)

    with λ> 0. As will be seen in the examples below, penalization of the magnitude of the coefficientscan significantly improve the performance of the POD approach.

  • 7

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    (b) Observability issueThe above techniques are standard in the scientific computing literature for flow reconstruction,but they bear a severe limitation. Indeed, since it is derived in an unsupervised fashion fromthe set of instances {xi}i, the approximation basis {φj}j is agnostic to the measurements s. Inother words, the approximation basis is determined with no supervision by the measurements.To illustrate the impact of this situation, let ν� = Φ+x be the least-squares estimate of theapproximation coefficients for a given field x. The difference between the least-squares estimatecoefficients ν� and the coefficients ν obtained from the linear sensor measurements s is

    ν� − ν = (Φ+ − (HΦ)+ H) x, (2.18)and the error in the reconstructed field is obtained immediately,∥∥x − x̂∥∥ = ∥∥(I − Φ (HΦ)+ H) x∥∥ , (2.19)where I is the identity matrix of suitable dimension.

    The error in the reconstructed field is seen to depend on both the approximation basis Φand the measurement operator H. The measurement operator is entirely defined by the sensorlocations, and it does not depend on the basis considered to approximate the field. Hence, toreduce (the expectation of) the reconstruction error, the approximation basis must be informedby both the dataset {xi}i and the sensors available, through H. For example, poorly located sensorswill lead to a large set of xi to lie in the nullspace of H, preventing their estimation, while thecoefficients of certain approximation modes may be affected by the observation Hxi of certainrealizations xi being severely amplified by (HΦ)+ if the approximation basis is not carefullychosen.

    This remark can be interpreted in terms of the control theory concept of observability of thebasis modes by the sensors. Most papers in the literature focus their attention on derivingan approximation basis leading to a good representation [4,24,25], i.e. such that the trainingset is well approximated in the k-dimensional basis {φj}j, x ≈ Φν. But how well the associatedcoefficients ν = ν(s) are informed by the measurements is usually overlooked when deriving thebasis. In practice, the decoupling between learning an approximation basis and learning a map tothe associated coefficients often leads to a performance bottleneck in the estimation procedure.Enforcing observability of the approximation basis by the sensors is key to a good recoveryperformance and can dramatically improve upon unsupervised methods, as shown in [30].

    3. Shallow neural networks for flow reconstructionShallow learning techniques are widely used for flow reconstruction. For instance, theapproximation-based approach for flow reconstruction, outlined in §2, can be considered to havetwo levels of complexity. The first level is concerned with computing an approximation basis,while the second level performs a linear-weighted combination of the basis elements to estimatethe high-dimensional flow field. Such shallow learning techniques are easy to train and tune.In addition, the levels are often physically meaningful, and they may provide some interestinginsights into the underlying mechanics of the system under consideration.

    In the following, we propose a simple SSN as an alternative to traditional methods, which aretypically very shallow, for flow reconstruction problems. Our proposed SD adds only one or twoadditional layers of complexity to the problem.

    (a) A SHALLOW DECODER for flow reconstructionWe can define a fully connected (FC) NN with K layers as a nested set of functions

    F (s; W) := R(WKR(WK−1 · · · R(W1s))), (3.1)

  • 8

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    where R(·) : R → R denotes a coordinate-wise scalar (nonlinear) activation function and Wdenotes a set of {Wk}k weight matrices, k = 1, . . . , K, with appropriate dimensions. NN-basedlearning provides a flexible framework for estimating the relationship between quantities froma collection of samples. Here, we consider SNNs, which are considered to be networks with veryfew, often only one, or even no, hidden layers, i.e. K is very small.

    In the following, an estimate of a vector y is denoted as ŷ, while ỹ denotes dummy vectors uponwhich one optimizes. Relying on a training set {xi, si}ni=1, with n examples xi and correspondingsensor measurements si, we aim to learn a function F : s �→ x̂ belonging to a class of NNs F ,which minimizes the misfit in a Euclidean sense, over all sensor measurements

    F ∈ arg minF̃∈F

    n∑i=1

    ∥∥∥xi − F̃ (si)∥∥∥22

    . (3.2)

    We assume that only a small number of training examples are available. Further, no priorinformation is assumed to be available, and the estimation method is purely data-driven.Importantly, we assume no knowledge about the underlying measurement operator which is usedto collect the sensor measurements. Further, unlike traditional methods for flow reconstruction,this NN-based learning methodology allows the joint learning of both the modes and thecoefficients.

    (b) ArchitectureWe now discuss some general principles guiding the design of a good network architecture forflow reconstruction. These considerations lead to the following nested nonlinear function:

    F (s) = Ω(ν(ψ(s))). (3.3)The architecture design is guided by the paradigm of simplicity. Indeed, the architecture shouldenable fast training and little tuning, and offer an intuitive interpretation.

    Recall that the interpretability of the flow field estimate is favoured by representing it in abasis of moderate size, whose modes can be identified with spatial structures of the field. Thismeans that the estimate can be represented as a linear combination of k modes {φj}j, weightedby coefficients {νj}j; see equation (2.1). These modes are a function of the inputs. This naturallyleads us to consider a network in which the output x̂ is given by a linear, FC, last layer of k inputs,interpreted as ν. These coefficients are informed by the sensor measurements s in a nonlinear way.

    The nonlinear map s �→ ν can be described by a hidden layer, whose outputs ψ are hereaftertermed measurement features, in analogy with kernel-based methods, where raw measurementss are nonlinearly lifted as extended measurements to a higher dimensional space. In thisarchitecture, the measurement featuresψ essentially describe nonlinear combinations of the inputmeasurement s. The nonlinear combinations are then mapped to the coefficients ν associated withthe modes φ. While the size of the output layer is that of the discrete field x, the size of the lasthidden layer (ν) is chosen and defines the size k of the dictionary Φ. This size can be estimatedfrom the data {xi}i by dimensionality estimation techniques [45,46]. Restricting the description ofthe training data to a low-dimensional space is of potential interest to practitioners, who mayinterpret the elements of the resulting basis in a physically meaningful way. The additionalstructure allows one to express the field of interest in terms of modes that practitioners mayinterpret, i.e. relate to some physics phenomena such as travelling waves, instability patterns(e.g. Kelvin–Helmholtz), etc.

    In contrast, the size of the first hidden layer describing ψ is essentially driven by the sizeof the input layer (s) and the number of nonlinear combinations used to nonlinearly informthe coefficients ν. The general shape of the network then bears flexibility in the hidden layers.A popular architecture for decoders consists of non-decreasing layer sizes, so as to increasecontinuously the size of the representation from the low-dimensional observations to the high-dimensional field. We can model F as an SNN with two hidden layers ψ and ν, followed by alinear output layer Ω .

  • 9

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    Two types of hidden layers, namely FC and convolution layers, can be considered. The powerof convolution layers is key to the success of recent deep learning architectures in computervision. However, in our problem, we favour FC layers. The reason are as follows: (i) our sensormeasurements have no spatial ordering; (ii) depending on the number of filters, convolutionlayers require a large number of examples for training, while we assume that only a small numberof examples are available for training; and (iii) potential dynamical systems that we considerevolve on a curved domain, which is typically represented using an unstructured grid. Thus, thefirst and second hidden layers take the form

    zψ =ψ(s) := R(Wψs + bψ )and

    zν = ν(zψ ) := R(Wνzψ + bν ),where W denotes a dense weight matrix and b is a bias term. The function R(·) denotes anactivation function used to introduce nonlinearity into the model, as discussed below. The finallinear output layer simply takes the form of

    x̂ = Ω(zν ) := Φzν + bΦ ,where we interpret the columns of the weight matrix Φ as modes. In summary, the architectureof our SD can be outlined as

    s �→ψ(s) �→ ν(zψ ) �→ Ω(zν ) ≡ x̂.Depending on the dataset, we need to adjust the size of each layer. Here, we use narrow ratherthan wide layers. Prescribing the size of the output layer restricts the dimension of the space inwhich the estimation lies, and it effectively regularizes the problem, e.g. filtering out most of thenoise which is not living in a low-dimensional space.

    The rectified linear unit (ReLU) activation function is among the most popular choices incomputer vision applications, owing to its favourable properties [47]. The ReLU activation isdefined as the positive part of a signal z,

    R(z) := max(z, 0). (3.4)The transformed input signal is also called activation. While the ReLU activation functionperforms best on average in our experiments, there are other choices. For instance, we have alsoconsidered the Swish [48] activation function.

    (c) RegularizationOverfitting is a common problem in machine learning and occurs if a function fits a limited setof data points too closely. In particular, this is a problem for deep neural networks, which oftenhave more neurons (trainable parameters) than can be justified by the limited amount of trainingexamples which are available. There is increasing interest in characterizing and understandinggeneralization and overfitting in NNs [49,50]. Hence, additional constraints are required tolearn a function which generalizes to new observations that have not been used for training.Standard strategies to avoid overfitting include early stopping rules and weight penalties (L2

    regularization) to regularize the complexity of the function (network). In addition to these twostrategies, we also use batch normalization (BN) [51] and dropout layers (DLs) [52] to improvethe convergence and robustness of the SD. This yields the following architecture:

    s �→ψ(s) �→ BN �→ DL �→ ν(zψ ) �→ BN �→ Ω(zν ) ≡ x̂.Regularization, in its various forms, requires one to ‘fiddle’ with a large number of knobs (i.e.hyper-parameters). However, we have found that SNNs are less sensitive to the particular choiceof parameters; hence, SNNs are easier to tune.

    Batch normalization. BN is a technique to normalize (mean zero and unit standard deviation)the activation. From a statistical perspective, BN eases the effect of internal covariate shifts [51].

  • 10

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    In other words, BN accounts for the change of distribution of the output signals (activation)across different mini-batches during training. Each BN layer has two parameters, which arelearned during the training stage. This simple, yet effective, prepossessing step allows one touse higher learning rates for training the network. In addition it also reduces overfitting owing toits regularization effect.

    Dropout layers. DLs help to improve the robustness of a NN. The idea is to switch off (drop) asmall fraction of randomly chosen hidden units (neurons) during the training stage. This strategycan be seen as some form of regularization which also helps to reduce interdependent learningbetween the units of an FC layer. In our experiments the drop ratio is set to p = 10%.

    (d) A note on overparameterized networksThe expressive power of NNs can be seen as a function of the depth (i.e. number of hidden layers)and the width (i.e. number of neurons per hidden layer) of the architecture [53]. Shallow networkstypically tend to compensate for the reduced depth by increasing the width of the hidden layers.In turn, this can lead to shallow architectures that have more parameters than a comparable deepand narrow architecture for the same problem. However, such (potentially) overparameterizednetworks do not necessarily perform worse. On the contrary, recent theory suggests that it can beeasier to train very overparameterized models with stochastic gradient descent (SGD) [54,55].

    This may be surprising, since conventional machine learning wisdom states thatoverparamerized models tend to overfit and show poor generalization performance. However,recent results show that overparamerized models trained to minimum norm solutions can indeedpreserve the ability to generalize well [56–60].

    (e) OptimizationGiven a training set with n targets {xi}i and corresponding sensor measurements {si}i, weminimize the misfit between the reconstructed quantity x̂ =F (s) and the observed quantity x,in terms of the L2-norm

    F ∈ arg minF̃

    n∑i=1

    ∥∥∥xi − F̃ (si)∥∥∥22

    + λ‖W i‖22.

    The second term on the right-hand side introduces L2 regularization to the weight matrices, whichis controlled via the parameter λ> 0. It is well known that L2-norm is sensitive to outliers; andthe L1-norm can be used as a more robust loss function.

    We use the ADAM optimization algorithm [61] to train the SD, with learning rate 10−2 andweight decay 10−4 (also known as L2 regularization). The learning rate, also known as step size,controls how much we adjust the weights in each epoch. The weight decay parameter is importantsince it allows one to regularize the complexity of the network. In practice, we can improve theperformance by changing the learning rate during training. We decay the learning rate by a factorof 0.9 after 100 epochs. Indeed, the reconstruction performance in our experiments is considerablyimproved by this dynamic scheme, compared with a fixed parameter setting. In our experiments,ADAM shows a better performance than SGD with momentum [62] and averaged SGD [63]. Thehyper-parameters can be fine-tuned in practice, but our choice of parameters works reasonablywell for several different examples. Note that we use the method described by He et al. [64] inorder to initialize the weights. This initialization scheme is favourable, in particular because theoutput layer is high-dimensional.

    4. Empirical evaluationWe evaluate our methods on three classes of data. First, we consider a periodic flow behind acircular cylinder, as a canonical example of fluid flow. Then, we consider the weekly mean SST,as a second and more challenging example. Finally, the third and most challenging example weconsider is a forced isotropic turbulence flow.

  • 11

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    snapshot index, t

    dim

    ensi

    on

    (a)

    snapshot index, t

    dim

    ensi

    on

    (b)

    Figure 3. Two different training and test set configurations, showing (a) a within-sample prediction task and (b) an out-of-sample prediction task. Here, the grey columns indicate snapshots used for training, while the red columns indicate snapshotsused for testing. (Online version in colour.)

    As discussed in §1, the SD requires that the training data represent the system, in the sensethat they should comprise samples drawn from the same statistical distribution as the testingdata. Indeed, this limitation is standard to data-driven methods, both for flow reconstruction andalso more generally. Hence, we are mainly concerned with exploring reconstruction performanceand generalizability for within-sample prediction rather than for out-of-sample prediction tasks. Inour third example, however, we demonstrate the limitations of the SD, illustrating difficultiesthat arise when one tries to extrapolate, rather than interpolate, the flow field. Figure 3 illustratesthe difference between the two types of tasks.

    In the first two example classes of data, the sensor information is a subset of the high-dimensional flow field, i.e. the measurement operator H ∈ Rp×m only has one non-zero entry inrows corresponding to the index of a sensor location. Letting J ∈ [1, m]p ⊂ Np be the set of indicesindexing the spatial location of the sensors, the measurement operator is such that

    s = Hx = xJ , (4.1)

    that is, the observations are simply point-wise measurements of the field of interest. In the aboveequation, xJ is the restriction of x to its entries indexed by J . In this paper, no attempt is made tooptimize the location of the sensors. In practical situations, they are often given or constrainedby other considerations (wiring, intrusivity, manufacturing, etc.). We use random locationsin our examples. The third example class of data demonstrates the SD using sub-gridscalemeasurements.

    The error is quantified in terms of the normalized root-mean-square residual error

    NME =∥∥x − x̂∥∥2

    ‖x‖2, (4.2)

    denoted in the following as ‘NME’. However, this measure can be misleading if the empiricalmean is dominating. Hence, we consider also a more sensitive measure which quantifies thereconstruction accuracy of the deviations around the empirical mean. We define this measure as

    NFE =∥∥x′ − x̂′∥∥2

    ‖x′‖2, (4.3)

    where x′ and x̂′ are the fluctuating parts around the empirical mean. In our experiments, weaverage the errors over 30 runs for different sensor distributions.

  • 12

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    Table 1. Performance for the flow past a cylinder for a varying number of sensors. Results are averaged over 30 runs withdifferent sensor distributions, with standard deviations in parentheses. The parameter k∗ indicates the number of modes thatwere used for flow reconstruction by the PODmethod, andα refers to the strength of ridge regularization applied to POD PLUS.

    training set test set

    sensors NME NFE NME NFE

    POD 5 0.465 (0.39) 0.675 (0.57) 0.488 (0.41) 0.698 (0.59). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD (k∗ = 4) 5 0.217 (0.02) 0.325 (0.01) 0.227 (0.03) 0.324 (0.04). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD PLUS (α= 1 × 10−8) 5 0.198 (0.02) 0.288 (0.03) 0.203 (0.02) 0.291 (0.03). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    SD 5 0.003 (0.00) 0.004 (0.00) 0.006 (0.00) 0.008 (0.00). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD 10 0.346 (1.54) 0.502 (2.23) 0.379 (1.70) 0.542 (2.43). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD (k∗ = 8) 10 0.049 (0.00) 0.071 (0.01) 0.051 (0.01) 0.072 (0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD PLUS (α= 1 × 10−13) 10 0.035 (0.01) 0.050 (0.02) 0.035 (0.01) 0.050 (0.02). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    SD 10 0.002 (0.00) 0.003 (0.00) 0.005 (0.00) 0.007 (0.00). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD 15 0.441 (1.81) 0.639 (2.63) 0.574 (2.44) 0.821 (3.49). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD (k∗ = 12) 15 0.015 (0.00) 0.023 (0.01) 0.016 (0.01) 0.023 (0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD PLUS (α= 1 × 10−12) 15 0.016 (0.01) 0.023 (0.01) 0.016 (0.01) 0.022 (0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    SD 15 0.002 (0.00) 0.003 (0.00) 0.005 (0.00) 0.007 (0.00). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    (a) Fluid flow behind the cylinderThe first example we consider is the fluid flow behind a circular cylinder, at Reynolds number 100,based on cylinder diameter, a canonical example in fluid dynamics [65]. The flow is characterizedby a periodically shedding wake structure and exhibits smooth, large-scale patterns. A directnumerical simulation of the two-dimensional Navier–Stokes equations is achieved via theimmersed boundary projection method [66,67]. In particular, we use the fast multi-domainmethod [67], which simulates the flow on five nested grids of increasing size, with each gridconsisting of 199 × 449 grid points, covering a domain of 4 × 9 cylinder diameters on thefinest domain. We collect 151 snapshots in time, sampled uniformly in time and coveringseveral periods of vortex shedding. For the following experiment, we use cropped snapshots ofdimension 199 × 384 on the finest domain, as we omit the spatial domain upstream to the cylinder.Further, we split the dataset into a training set and a test set so that the training set comprises thefirst 100 snapshots, while the remaining 51 snapshots are used for validation. Note that differentsplittings (interpolation and extrapolation) yield nearly the same results since the flow is periodic.

    (i) Varying numbers of random structured point-wise sensor measurements

    We investigate the performance of the SD using varying numbers of sensors. A realistic setting isconsidered in that the sensors can only be located on a solid surface. The retained configurationaims at reconstructing the entire vorticity flow field from information at the cylinder surface only.The results are averaged over different sensor distributions on the cylinder downstream-facingsurface and are summarized in table 1. Further, to contextualize the precision of the algorithms,we also state the standard deviation in parentheses.

    The SD shows an excellent flow reconstruction performance compared with traditionalmethods. Indeed, the results show that very few sensors are already sufficient to get an accurateapproximation. Further, we can see that the SD is insensitive to the sensor location, i.e. thevariability of the performance is low when different sensor distributions on the cylinder surfaceare used. In stark contrast, this simple set-up poses a challenge for the POD method without

  • 13

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    trut

    hPO

    DPO

    D P

    LU

    SSD

    (a)

    (b)

    (c)

    (d)

    Figure 4. Visual results for the canonical flow for two different sensor distributions. In (a) the target snapshots and the specificsensor configurations (here using five sensors) are shown. Depending on the sensor distribution, the POD-based method is notable to accurately reconstruct the high-dimensional flow field, as shown in (b). The regularized POD PLUS method performsslightly better, as shown in (c). The SD yields an accurate flow reconstruction, as shown in (d). (Online version in colour.)

    regularization, which is seen to be highly sensitive to the sensor configuration. This is expectedsince poorly located sensors lead to a large probability that the vorticity field xi lies in thenullspace of H, preventing its estimation, as discussed in §2. While regularization can improvethe robustness slightly, the POD-based methods still require about at least 15 sensors to provideaccurate estimations for the high-dimensional flow field. (Here, we list results for the PODmethod with hard-threshold regularization and POD PLUS method with ridge regularization.The number of retained components (hard-threshold), which were used for flow reconstruction,is indicated by k∗ and the strength of ridge regularization is denoted by the parameter α. Seeappendix A for more details.) In contrast, the SD exhibits a good performance with as few asfive sensors. Note that the traditional methods could benefit from optimal sensor placement [4];however, this is beyond the scope of this paper.

    Figure 4 provides visual results for two specific sensor configurations using five sensors. Thesecond configuration is challenging for POD, which fails to provide an accurate reconstruction.POD PLUS provides a more accurate reconstruction of the flow field. The SD outperforms thetraditional methods in both situations.

  • 14

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    Table 2. Performance for estimating the flow behind a cylinder using nonlinear sensor measurements. Results are averagedover 30 runs with different sensor distributions, with standard deviations in parentheses. The standard POD-basedmethod failsfor this task. PODPLUS is able to reconstruct theflowfield, yet the estimationquality is poor. In contrast, the SDmethodperformswell.

    training set test set

    sensors NME NFE NME NFE

    POD 10 — — — —.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD PLUS (α= 5 × 10−4) 10 0.676 (0.00) 0.981 (0.00) 0.682 (0.09) 0.974 (0.00). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    SD 10 0.002 (0.00) 0.003 (0.00) 0.006 (0.00) 0.009 (0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    (ii) Nonlinear sensor measurements

    So far, the sensor information consisted of pointwise measurements of the local flow field so thatthe jth measurement is given by s(j) = Hjx = δτj [x] = x(j), j = 1, . . . , p, with δτj a Dirac distributioncentred at the location of the jth sensor and s(j) and x(j) the jth component of s and x, respectively.We now consider nonlinear measurements to demonstrate the flexibility of the SD. Here, weconsider the simple setting of squared sensor measurements: s(j) = (x x)(j), where denotesthe Hadamard product. Table 2 provides a summary of the results, using 10 sensors. The SDis agnostic to the functional form of the sensor measurements, and it achieves nearly the sameperformance as in the linear case above, i.e. the error for the test set increases less than 1%compared with the linear case in table 1.

    (iii) Noisy sensor measurements

    To investigate further the robustness and flexibility of the SD, we consider flow reconstructionin the presence of additive white noise. While this is not of concern when dealing with flowsimulations, it is a realistic setting when dealing with flows obtained in experimental studies.Table 3 lists the results for both a high- and low-noise situation with linear measurements. Byinspection, the SD outperforms classical techniques. In the high-noise case, with a signal-to-noiseratio (SNR) of 10, the average relative reconstruction error for the test set is about 27% for theSD. For a SNR of 50, the relative error is as low as 17%. Note that we here use an additional DL(placed after the first FC layer) to improve the robustness of the SD. In contrast, standard PODfails in both situations. Again, the POD PLUS method shows improved results over the standardPOD. However, the visual results in figure 5 show that the reconstruction quality of the SD isfavourable. The SD shows a clear advantage and a denoising effect. Indeed the reconstructedsnapshots allow for a meaningful interpretation of the underlying structure.

    (iv) Summary of empirical results for the flow behind the cylinder

    The empirical results show that the advantage of the SD compared with the traditional POD-based techniques is pronounced, even for a simple problem such as the flow behind the cylinder.It can be seen that the performance of the traditional techniques is patchy, i.e. the reconstructionquality is highly sensitive to the sensor location. While regularization can mitigate a poor sensorplacement design, a relatively larger number (greater than 15) of sensors is required in order toachieve an accurate reconstruction performance. More challenging situations such as nonlinearmeasurements and sensor noise pose a challenge for the traditional techniques, while the SDshows that it is able to reconstruct dominant flow features in such situations. The computationaldemands required to train the SD are minimal, e.g. the time for training on a modern GPU remainsbelow 2 min for this example.

  • 15

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    (a) (b)

    (c) (d)

    Figure 5. Visual results for the noisy flow behind the cylinder. Here the SNR is 10. In (a) the target snapshot and thecorresponding sensor configuration (using 10 sensors) is shown. Both POD and POD PLUS are not able to reconstruct the flowfield, as shown in (b) and (c). The SD is able to reconstruct the coherent structure of the flow field, as shown in (d). (Onlineversion in colour.)

    Table 3. Performance for estimating the flow behind a cylinder in the presence of white noise, using 10 sensors. Results areaveraged over 30 runs with different sensor distributions, with standard deviations in parentheses. POD fails for this task,while POD PLUS shows a better performance. The SD is robust to noisy sensor measurements and outperforms the traditionaltechniques. The parameter k∗ indicates the number of modes that were used for flow reconstruction by the POD method, andthe parameterα refers to the strength of ridge regularization applied to the the POD PLUS method.

    training set test set

    SNR NME NFE NME NFE

    POD 10 9.171 (14.7) 12.69 (20.4) 8.746 (12.9) 11.93 (17.6). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD (k∗ = 2) 10 0.461 (0.02) 0.638 (0.03) 0.468 (0.02) 0.639 (0.02). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD PLUS (α= 5 × 10−5) 10 0.468 (0.02) 0.648 (0.02) 0.472 (0.02) 0.644 (0.2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    SD 10 0.138 (0.02) 0.201 (0.02) 0.278 (0.04) 0.397 (0.05). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD 50 4.837 (3.08) 6.946 (4.42) 4.520 (2.75) 6.390 (3.89). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD (k∗ = 2) 50 0.342 (0.01) 0.492 (0.01) 0.349 (0.01) 0.493 (0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD PLUS (α= 1 × 10−5) 50 0.370 (0.03) 0.539 (0.04) 0.371 (0.02) 0.524 (0.03). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    SD 50 0.134 (0.02) 0.198 (0.02) 0.173 (0.02) 0.247 (0.03). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    (b) Sea surface temperature using random point-wise measurementsThe second example we consider is the more challenging SST dataset. Complex ocean dynamicslead to rich flow phenomena, featuring interesting seasonal fluctuations. While the mean SSTflow field is characterized by a periodic structure, the flow is non-stationary. The datasetconsists of the weekly SSTs for the last 26 years, publicly available from the National Oceanic &Atmospheric Administration. The data comprise 1483 snapshots in time with a spatial resolutionof 180 × 360. For the following experiments, we only consider 44 219 measurements, by excluding

  • 16

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    Table 4. Performance for estimating the SST dataset for varying numbers of sensors. Results are averaged over 30 runs withdifferent sensor distributions, with standard deviations in parentheses. The SD outperforms the traditional techniques and ishighly invariant to the sensor location. The parameter k∗ indicates the number of modes that were used for flow reconstructionby the POD method, andα refers to the strength of ridge regularization applied to POD PLUS.

    training set test set

    sensors NME NFE NME NFE

    POD 32 0.637 (0.59) 5.915 (5.56) 0.649 (0.62) 6.04 (5.77). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD (k∗ = 5) 32 0.036 (0.00) 0.342 (0.01) 0.037 (0.00) 0.344 (0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD PLUS (α= 1 × 10−5) 32 0.036 (0.00) 0.341 (0.01) 0.037 (0.00) 0.343 (0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    SD 32 0.009 (0.00) 0.088 (0.00) 0.014 (0.00) 0.128 (0.00). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD 64 0.986 (1.34) 9.183 (12.5) 1.007 (1.36) 9.344 (12.7). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD (k∗ = 14) 64 0.032 (0.00) 0.298 (0.01) 0.032 (0.00) 0.301 (0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    POD PLUS (α= 5 × 10−5) 64 0.032 (0.00) 0.301 (0.00) 0.032 (0.00) 0.301 (0.00). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    SD 64 0.009 (0.00) 0.085 (0.00) 0.012 (0.00) 0.118 (0.00). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    (a) (b)

    Figure 6. Results for the SST dataset. In (a), the high-dimensional target and the sensor configurations (using 64 sensors)are shown; and in (b), the results of the SD are shown. Note that we show here the mean centred snapshot. The SD shows anexcellent reconstruction quality for the fluctuations around the mean with an error as low as 12%. (Online version in colour.)

    measurements corresponding to the land masses. Further, we create a training set by selecting1100 snapshots at random, while the remaining snapshots are used for validation.

    We consider the performance of the SD using varying numbers of random sensors scatteredacross the spatial domain. The results are summarized in table 4. We observe a large discrepancybetween the NME and NFE error. This is because the long-term annual mean field accountsfor the majority of the spatial structure of the field. Hence, the NME error is uninformativewith respect to the performance of reconstruction methods. In terms of the NFE error the POD-based reconstruction technique is shown to fail to reconstruct the high-dimensional flow fieldusing limited sensor measurements. In contrast, the SD demonstrates an excellent reconstructionperformance using both 32 and 64 measurements. Figure 6 shows results to support thesequantitative findings.

  • 17

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    (a) (b) (c)

    Figure 7. Results for the turbulent isotropic flow using 121 sub-grid cell measurements. The interpolation error of the SD erroris about 9.3%. (a) Snapshot, (b) low resolution, and (c) SD. (Online version in colour.)

    Table 5. Flow reconstruction performance for estimating the isotropic flow. Results are averaged over 30 runs with differentsensor distributions, with standard deviations in parentheses.

    training set test set

    grids NME NFE NME NFE

    SD 36 0.029 (0.00) 0.041 (0.00) 0.071 (0.00) 0.101 (0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    SD 64 0.027 (0.00) 0.039 (0.00) 0.067 (0.00) 0.096 (0.00). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    SD 121 0.026 (0.00) 0.038 (0.00) 0.066 (0.00) 0.093 (0.00). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    (c) Turbulent flow using sub-gridscale measurementsThe final example we consider is the velocity field of a turbulent isotropic flow. Unlike theprevious examples, the isotropic turbulent flow is non-periodic in time and highly non-stationary.Thus, this dataset poses a challenging task. Here, we consider data from a forced isotropicturbulence flow generated with a direct numerical simulation using 10243 points in a triplyperiodic [0, 2π ]3 domain. For the following experiments, we are using 800 snapshots for trainingand 200 snapshots for validation. The data spread across about one large-eddy turnover time. Thedata are provided as part of the Johns Hopkins Turbulence Database [68].

    If the sensor measurements s are acquired on a coarse but regular grid, then the reconstructiontask may be considered as a super-resolution problem [3,69,70]. There are a number of directapplications of super-resolution in fluid mechanics centred around sub-gridscale modelling.Because many fluid flows are inherently multi-scale, it may be prohibitively expensive to collectdata that capture all spatial scales, especially for iterative optimization and real-time control [1].Inferring small-scale flow structures below the spatial resolution available is an important taskin large-eddy simulation, climate modelling and particle image velocimetry, to name a fewapplications. Deep learning has recently been employed for super-resolution in fluid mechanicsapplications with promising results [11]. Note that our setting differs from the super-resolutionproblem. Here, we obtain first a low-resolution image by applying a mean filter to the high-dimensional snapshot. Then, we use a single sensor measurement per grid cell to form the inputs(illustrated in figure 7b). In contrast, super-resolution uses the low-resolution image as input.

    First, we consider the within-sample prediction task. In this case, we yield excellent results forthe estimated high-dimensional flow fields, despite the challenging problem. Table 5 quantifiesthe performance for varying numbers of sub-gridscale measurements. In addition, figure 7provides some visual evidence for the good performance for this problem.

  • 18

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    (a)

    (b)

    (c)

    (d)

    (e)

    ( f )

    Figure 8. Visual results illustrating the limitation of the SD for extrapolation tasks. Flow fields sampled from or close to thestatistical distribution describing the training examples can be reconstructed with high accuracy, as shown in (a) and (b).Extrapolation fails for fields which belong to a different statistical distribution, as shown in (e) and (f ). (a) Test snapshot t = 1,(b) reconstruction, (c) test snapshot t = 20, (d) reconstruction, (e) test snapshot t = 50 and (f ) reconstruction. (Online versionin colour.)

    Next, we illustrate the limitation of the SD. Indeed, it is important to stress that the SD cannotbe used for ‘out-of-sample prediction tasks’ if the fluid flow is highly non-stationary. To illustratethis issue, figure 8 shows three flow fields at different temporal locations. First, figure 8a shows atest example, which is close in time to the training set. In this case, the SD is able to reconstruct theflow field with high accuracy. The reconstruction quality drops for snapshots which are furtheraway in time, as shown in figure 8c. Finally, figure 8e shows that reconstruction fails if the testexample is far away from the training set in time, i.e. the flow field is not drawn from the samestatistical distribution as the training examples are.

    5. DiscussionThe emergence of sensor networks for global monitoring (e.g. ocean and atmospheric monitoring)requires new mathematical techniques that are capable of maximally exploiting sensors forstate estimation and forecasting. Emerging algorithms from the machine learning communitycan be integrated with many traditional scientific computing approaches to enhance sensornetwork capabilities. For many global monitoring applications, the placement of sensors canbe prohibitively expensive, thus requiring learning techniques such as the one proposed here,which can exploit a reduction in the number of sensors while maintaining required performancecharacteristics.

    To partially address this challenge, we proposed an SD with two hidden layers for the problemof flow reconstruction. The mathematical formulation presented is significantly different from

  • 19

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    what is commonly used in flow reconstruction problems, e.g. gappy interpolation with dominantPOD modes. Indeed, our experiments demonstrate the improved robustness and accuracy of fluidflow field reconstruction by using our SD.

    Future work aims to leverage the underlying laws of physics in flow problems to furtherimprove the efficiency. In the context of flow reconstruction or, more generally, observation ofa high-dimensional physical system, insights from the physics at play can be exploited [71]. Inparticular, the dynamics of many systems do indeed remain low-dimensional and the trajectoryof their state vector lies close to a manifold whose dimension is significantly lower than theambient dimension. Moreover, the features exploited from the SD network can also be integratedin reduced-order models (ROMs) for forecasting predictions [72]. In many high-dimensionalsystems where ROMs are used, the ability to generate low-fidelity models that can be rapidlysimulated has revolutionized our ability to model such complex systems, especially in applicationof complex flow fields. The ability to rapidly generate low-rank feature spaces alternative to PODgenerates new possibilities for ROMs using limited sampling and limited data. This aspect of theSD will be explored further in future work.

    Data accessibility. This article has no additional data.Authors’ contributions. N.B.E. carried out the experiments, participated in the design of the study and draftedthe manuscript; L.M. participated in the design of the study and drafted the manuscript; Z.Y. participated inrunning the experiments; J.N.K. coordinated the study and helped draft the manuscript; S.L.B. and M.W.M.critically revised the manuscript. All authors gave final approval for publication and agree to be heldaccountable for the work performed herein.Competing interests. We declare we have no competing interests.Funding. L.M. gratefully acknowledges the support of the French Agence Nationale pour la Recherche (ANR)and Direction Générale de l’Armement (DGA) via the FlowCon project (grant no. ANR-17-ASTR-0022).S.L.B. acknowledges support from the Army Research Office (grant no. ARO W911NF-17-1-0422). J.N.K.acknowledges support from the Air Force Office of Scientific Research (grant no. FA9550-19-1-0011). L.M. andJ.N.K. also acknowledge support from the Air Force Office of Scientific Research (grant no. FA9550-17-1-0329).Acknowledgements. M.W.M. would like to acknowledge ARO, DARPA, NSF and ONR for providing partialsupport for this work. We would also like to thank Kevin Carlberg for valuable discussions about flowreconstruction techniques.

    Appendix A. Hyper-parameter search for the POD-based methodsIn the following, we provide results of our hyper-parameter search for determining the optimaltuning parameters for flow reconstruction. We proceed by evaluating the reconstruction error ofthe POD and POD PLUS method for a plausible range of values. Here, we consider hard-thresholdregularization for the POD method and ridge regularization for the POD PLUS method. We run30 trials of the experiment, where we use a unique sensor location configuration at each trial.

    Figure 9 shows the results for the fluid flow past the cylinder. First, we show the results forthe POD PLUS method in (a). Regularizing the solution improves the reconstruction accuracyand the effect of regularization on the reconstruction error is pronounced for an increasingnumber of sensors. (Note that, at the same time as the reconstruction error is decreasing withan increasing numbers of sensors, finding the optimal tuning parameter becomes more difficult.)Next, we show the results for POD with hard-threshold regularization in (b). It can be seen thatthe performance is on a par with ridge regularization (plotted as a black dashed line), where hard-threshold regularization is shown to have a lower variance than ridge regularization. In contrast,the SD outperforms both the POD and the POD PLUS methods, represented by a red dashed line.However, the performance gap between the POD-based methods and the SD is closing for anincreased number of sensors. This is not surprising, since the flow past the cylinder representsa relatively simple problem where the POD method is known to provide good reconstructionresults, given a sufficiently large number of sensors.

  • 20

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    10–15 10–11 10–7 10–3

    ridge tuning parameter

    regularized POD PLUS (train)regularized POD PLUS (test)SD (test)

    5 sensors 10 sensors 15 sensors

    1 2 3 4 5no. POD modes no. POD modes

    10–2

    10–1

    1

    reco

    nstr

    uctio

    n er

    ror

    10–2

    10–1

    1re

    cons

    truc

    tion

    erro

    r

    10–2

    10–1

    1

    reco

    nstr

    uctio

    n er

    ror

    10–2

    10–1

    1

    reco

    nstr

    uctio

    n er

    ror

    10–2

    10–1

    1

    reco

    nstr

    uctio

    n er

    ror

    10–2

    10–1

    1

    reco

    nstr

    uctio

    n er

    ror

    POD (train)POD (test)POD PLUS (test)SD (test)

    5 sensors 10 sensors

    5 10 15no. POD modes

    15 sensors

    10–15 10–11 10–7 10–3

    ridge tuning parameter10–15 10–11 10–7 10–3

    ridge tuning parameter

    108642

    (a)

    (b)

    Figure 9. Results of the hyper-parameter search for the flow past the cylinder. The results for the POD PLUS method, usingridge regularization, are shown in (a), and the results for the POD method, using a hard threshold, are shown in (b). The SDoutperforms the POD-based methods in all situations, while the performance gap closes for an increased number of sensors.(Online version in colour.)

    SNR10 SNR50

    2 4 6 8 10no. POD modes

    2 4 6 8 10no. POD modes

    10–2

    10–1

    1

    10

    102

    reco

    nstr

    uctio

    n er

    ror

    10–2

    10–1

    1

    10

    102

    reco

    nstr

    uctio

    n er

    ror

    10–2

    10–1

    1

    10

    102

    reco

    nstr

    uctio

    n er

    ror

    10–2

    10–1

    1

    10

    102re

    cons

    truc

    tion

    erro

    r

    POD (train)POD (test)POD PLUS (test)SD (test)

    SNR10 SNR50

    10–15 10–11 10–7 10–3

    ridge tuning parameter10–15 10–11 10–7 10–3

    ridge tuning parameter

    POD (train)POD (test)POD PLUS (test)SD (test)

    regularized POD PLUS (train)regularized POD PLUS (test)SD (test)

    regularized POD PLUS (train)regularized POD PLUS (test)SD (test)

    (a) (b)

    Figure 10. Results of the hyper-parameter search for the noisy flow past the cylinder. Here we consider the SNRs 10 and 50and a setting with 10 sensors. (a) POD PLUS with ridge regularization and (b) POD with hard-threshold regularization. (Onlineversion in colour.)

    10–2

    10–1

    1

    reco

    nstr

    uctio

    n er

    ror

    10–2

    10–1

    1

    reco

    nstr

    uctio

    n er

    ror

    10–2

    10–1

    1

    10

    102

    reco

    nstr

    uctio

    n er

    ror

    10–2

    10–1

    1

    10

    102

    reco

    nstr

    uctio

    n er

    ror

    10–15 10–11 10–7 10–3

    ridge tuning parameter10–15 10–11 10–7 10–3

    ridge tuning parameter

    32 sensors 64 sensors

    0 0 60402010 20 30no. POD modes no. POD modes

    POD (train)POD (test)POD PLUS (test)SD (test)

    32 sensors 64 sensors

    POD (train)POD (test)POD PLUS (test)SD (test)

    (a) (b)regularized POD PLUS (train)regularized POD PLUS (test)SD (test)

    regularized POD PLUS (train)regularized POD PLUS (test)SD (test)

    Figure 11. Results of the hyper-parameter search for the SST data. Here we consider a setting with 32 and 64 sensors. The SDoutperforms the POD-based methods in all situations. (a) POD PLUS with ridge regularization and (b) POD with hard-thresholdregularization. (Online version in colour.)

    Figures 10 and 11 show the results for the noisy flow past the cylinder and for the SST data.Again, it can be seen that both ridge regularization and hard-threshold regularization perform ona par, while the SD outperforms the POD-based methods.

  • 21

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    (a)

    (b)

    (c)

    10

    10–3m

    agni

    tude

    mag

    nitu

    de

    10–7

    10–11

    1

    1

    1 110 10102 102103

    10 102 103 1 10 102 103

    1 1010no. singular value

    no. singular value no. singular value

    no. singular value no. singular value

    no. singular value102

    102104

    1

    mag

    nitu

    de

    10–2

    10–4

    10

    103

    10–1

    mag

    nitu

    de

    10–3

    10–5

    10

    10–1

    mag

    nitu

    de

    10–3

    10–5

    102104

    1

    mag

    nitu

    de

    10–2

    10–4

    10

    10–3

    10–7

    10–11true spectrumSDPOD

    true spectrumSDPOD

    true spectrumSDPOD

    true spectrumSDPOD

    true spectrumSDPOD

    true spectrumSDPOD

    Figure 12. Singular spectrum analysis of the reconstructed data and ground truth. The left column shows the training dataand the right column the test data. The POD-based method uses the optimal k∗ truncation to reconstruct the data. Here weconsider in-sample test data. (a) Fluid flow behind the cylinder. We use 15 sensor locations for reconstruction. (b) SST data.We use 64 sensor locations for reconstruction. (c) Turbulent flow. We use 128 sub-gridscale measurements for reconstruction.(Online version in colour.)

    Appendix B. Singular spectrum analysis of reconstructed dataHere we provide additional results that show the singular value spectrum of the reconstructedtraining and test data. As reference we also show the spectrum of the ground truth data. Figure 12shows the results for (a) the fluid flow behind the cylinder, (b) the sea surface data, and (c) theturbulent flow. The singular value spectrum of the reconstructed data helps us to compare theperformance between the POD-based method and our SD.

    For all problems that we consider, it can be seen that the SD captures more fine-scaleinformation than the truncated POD-based method. Note that we consider the case where thetraining and test data are sampled from the same distribution.

    Appendix C. Set-up for our empirical evaluationHere, we provide details about the concrete network architectures of the SD, which are usedfor the different examples. The networks are implemented in Python using PyTorch; and researchcode for flow behind the cylinder is available via https://github.com/erichson/ShallowDecoder.Tables 6–8 show the details. For each example we use a similar architecture design. The differenceis that we use a slightly wider design (more neurons per layer) for the SST dataset and theisotropic flow. This is because we are using a larger number of sensors for these two problems,and thus we need to increase the capacity of the network. In each situation, the learning rate is set

    https://github.com/erichson/ShallowDecoder

  • 22

    royalsocietypublishing.org/journal/rspaProc.R.Soc.A476:20200097

    ...........................................................

    Table 6. Architecture of the SD for the flow behind the cylinder. The batch size is set to 32. Here, we set the dropout rate to 0.1for the noisy situation. We use a small amount of weight decayλ= 1 × 10−7.

    layer weight size input shape output shape activation batch norm. dropout

    FC sensors× 35 sensors 35 ReLU true —. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    FC 35 × 40 25 40 ReLU true —. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    FC 40 × 76 416 40 76 416 linear — —. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Table 7. Architecture of the SD for the SST dataset. Here, the batch size is set to 200.

    layer weight size input shape output shape activation batch norm. dropout

    FC sensors× 350 sensors 350 ReLU true 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    FC 350 × 400 350 400 ReLU true —. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    FC 400 × 44 219 400 44 219 linear — —. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Table 8. Architecture of the SD for isotropic flow. Here, the batch size is set to 200.

    layer weight size input shape output shape activation batch norm. dropout

    FC sensors× 350 sensors 350 ReLU true 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    FC 350 × 400 350 400 ReLU true —. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    FC 400 × 122 500 400 122 500 linear — —. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    to 1 × 10−2 with a scheduled decay rate of 0.3. Further, we use a small amount of weight decayλ= 1 × 10−7 to regularize the network.

    References1. Brunton SL, Noack BR. 2015 Closed-loop turbulence control: progress and challenges. Appl.

    Mech. Rev. 67, 050801-1–050801-48. (doi:10.1115/1.4031175)2. Rowley CW, Dawson ST. 2017 Model reduction for flow analysis and control. Annu. Rev. Fluid

    Mech. 49, 387–417. (doi:10.1146/annurev-fluid-010816-060042)3. Callaham