proper orthogonal decomposition

10
SPECIAL SECTION: COMPUTATIONAL SCIENCE CURRENT SCIENCE, VOL. 78, NO. 7, 10 APRIL 2000 808 TUTORIALS An introduction to the proper orthogonal decomposition Anindya Chatterjee* Department of Engineering Science and Mechanics, Penn State University, University Park, Pennsylvania, PA, 16802, USA A tutorial is presented on the Proper Orthogonal Decomposition (POD), which finds applications in computa- tionally processing large amounts of high-dimensional data with the aim of obtaining low-dimensional descriptions that capture much of the phenomena of interest. The dis- crete version of the POD, which is the singular value de- composition (SVD) of matrices, is described in some detail. The continuous version of the POD is outlined. Low-rank approximations to data using the SVD are discussed. The SVD and the eigenvalue decomposition are compared. Two geometric interpretations of the SVD/POD are given. Computational strategies (using standard software) are mentioned. Two numerical examples are provided: one shows low-rank approximations of a surface, and the other demonstrates simple a posteriori analysis of data from a simulated vibroimpact system. Some relevant computer code is supplied. 1. Introduction COMPUTERS have increased our capacity to not only simu- late complicated systems, but also to collect and analyse large amounts of data. Using personal computers, it is now unremarkable to record data at, say, 20 kHz for a few hours. One might then process hundreds of millions of data points to obtain a few quantities of final interest. For example, ex- pensive machinery might be instrumented and monitored over days, with the sole objective of efficiently scheduling maintenance. This article provides an introduction to the Proper Ortho- gonal Decomposition (POD) which is a powerful and elegant method of data analysis aimed at obtaining low- dimensional approximate descriptions of high-dimensional processes. The POD was developed by several people (among the first was Kosambi 1 ), and is also known as Prin- cipal Component Analysis, the Karhunen–Loéve Decomposition, and the single value decomposition. The POD has been used to obtain approximate, low-dimensional descriptions of turbulent fluid flows 2 , structural vibra- tions 3,4 , and insect gait 5 , and has been used for damage de- tection 6 , to name a few applications in dynamic systems. It has also been extensively used in image processing, signal analysis and data compression. For references to the many sources of the POD, for applications of the POD in a variety of fields, as well as for a nice treatment that complements this tutorial, the reader is encouraged to read Holmes, Lum- ley and Berkooz 2 (chapter 3). Data analysis using the POD is often conducted to ex- tract ‘mode shapes’ or basis functions, from experimental data or detailed simulations of high-dimensional systems, for subsequent use in Galerkin projections that yield low- dimensional dynamical models (see ref. 2). This article con- centrates on the data analysis aspect, and subsequent re- duced-order modelling is not discussed. 2. Motivation Suppose we wish to approximate a function z(x, t ) over some domain of interest as a finite sum in the variables- separated form = M k k k x t a t x z 1 ), ( ) ( ) , ( f (1) with the reasonable expectation that the approximation be- comes exact in the limit as M approaches infinity, except possibly on a set of measure zero (readers unfamiliar with measure theory may ignore it if they deal with finite-dimen- sional calculations; and consult, e.g. Rudin 7 otherwise). While in eq. (1) there is no fundamental difference be- tween t and x, we usually think of x as a spatial coordinate (possibly vector-valued) and think of t as a temporal coor- dinate. The representation of eq. (1) is not unique. For example, if the domain of x is a bounded interval X on the real line, then the functions f k (x) can be chosen as a Fourier series, or Legendre polynomials, or Chebyshev polynomials, and so on. For each such choice of a sequence f k (x) that forms a basis for some suitable class of functions z(x, t ) (see note 1), the sequence of time-functions a k (t ) is different. That is, for sines and cosines we get one sequence of functions a k (t ) is different. That is, for sines and cosines we get one sequence of functions a k (t ), for Legendre polynomials we get another, and so on. The POD is concerned with one possible choice of the functions f k (x). e-mail: [email protected]

Upload: kenry-xu-chi

Post on 04-Mar-2015

149 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Proper Orthogonal Decomposition

SPECIAL SECTION: COMPUTATIONAL SCIENCE

CURRENT SCIENCE, VOL. 78, NO. 7, 10 APRIL 2000808

TUTORIALS

An introduction to the proper orthogonaldecomposition Anindya Chatterjee* Department of Engineering Science and Mechanics, Penn State University, University Park, Pennsylvania, PA, 16802, USA

A tutorial is presented on the Proper OrthogonalDecomposition (POD), which finds applications in computa-tionally processing large amounts of high-dimensionaldata with the aim of obtaining low-dimensional descriptionsthat capture much of the phenomena of interest. The dis-crete version of the POD, which is the singular value de-composition (SVD) of matrices, is described in some detail.The continuous version of the POD is outlined. Low-rankapproximations to data using the SVD are discussed. TheSVD and the eigenvalue decomposition are compared. Twogeometric interpretations of the SVD/POD are given.Computational strategies (using standard software) arementioned. Two numerical examples are provided: oneshows low-rank approximations of a surface, and the otherdemonstrates simple a posteriori analysis of data from asimulated vibroimpact system. Some relevant computercode is supplied.

1. Introduction

COMPUTERS have increased our capacity to not only simu-late complicated systems, but also to collect and analyselarge amounts of data. Using personal computers, it is nowunremarkable to record data at, say, 20 kHz for a few hours.One might then process hundreds of millions of data pointsto obtain a few quantities of final interest. For example, ex-pensive machinery might be instrumented and monitoredover days, with the sole objective of efficiently schedulingmaintenance. This article provides an introduction to the Proper Ortho-gonal Decomposition (POD) which is a powerful andelegant method of data analysis aimed at obtaining low-dimensional approximate descriptions of high-dimensionalprocesses. The POD was developed by several people(among the first was Kosambi1), and is also known as Prin-cipal Component Analysis, the Karhunen–LoéveDecomposition, and the single value decomposition. ThePOD has been used to obtain approximate, low-dimensionaldescriptions of turbulent fluid flows2, structural vibra-tions3,4, and insect gait5, and has been used for damage de-tection6, to name a few applications in dynamic systems. It

has also been extensively used in image processing, signalanalysis and data compression. For references to the manysources of the POD, for applications of the POD in a varietyof fields, as well as for a nice treatment that complementsthis tutorial, the reader is encouraged to read Holmes, Lum-ley and Berkooz2 (chapter 3). Data analysis using the POD is often conducted to ex-tract ‘mode shapes’ or basis functions, from experimentaldata or detailed simulations of high-dimensional systems,for subsequent use in Galerkin projections that yield low-dimensional dynamical models (see ref. 2). This article con-centrates on the data analysis aspect, and subsequent re-duced-order modelling is not discussed.

2. Motivation

Suppose we wish to approximate a function z(x, t) oversome domain of interest as a finite sum in the variables-separated form

∑=

≈M

kkk xtatxz

1

),()(),( φ (1)

with the reasonable expectation that the approximation be-comes exact in the limit as M approaches infinity, exceptpossibly on a set of measure zero (readers unfamiliar withmeasure theory may ignore it if they deal with finite-dimen-sional calculations; and consult, e.g. Rudin7 otherwise). While in eq. (1) there is no fundamental difference be-tween t and x, we usually think of x as a spatial coordinate(possibly vector-valued) and think of t as a temporal coor-dinate. The representation of eq. (1) is not unique. For example, ifthe domain of x is a bounded interval X on the real line, thenthe functions φk(x) can be chosen as a Fourier series, orLegendre polynomials, or Chebyshev polynomials, and soon. For each such choice of a sequence φk(x) that forms abasis for some suitable class of functions z(x, t) (see note 1),the sequence of time-functions ak(t) is different. That is, forsines and cosines we get onesequence of functions ak(t) is different. That is, for sinesand cosines we get one sequence of functions ak(t), forLegendre polynomials we get another, and so on. The PODis concerned with one possible choice of thefunctions φk(x).e-mail: [email protected]

Page 2: Proper Orthogonal Decomposition

SPECIAL SECTION: COMPUTATIONAL SCIENCE

CURRENT SCIENCE, VOL. 78, NO. 7, 10 APRIL 2000 809

If we have chosen orthonormal basis functions, i.e.

,otherwise0

if1d)()( 21

21 =

=∫kk

xxx kk φφx

then

∫=x

,d)(),()( xxtxzta kk φ (2)

i.e. for orthonormal basis functions, the determination of thecoefficient function ak(t) depends only on φk(x) and not onthe other φ’s. What criteria should we use for selecting the functionsφk? Orthonormality would be useful. Moreover, while anapproximation to any desired accuracy in eq. (1) canalways be obtained if M can be chosen large enough, wemay like to choose the φk(x) in such a way that theapproximation for each M is as good as possible in a leastsquares sense. That is, we would try to find, once and forall, a sequence of orthonormal functions φk(x) such that thefirst two of these functions give the best possible two-termapproximation, the first seven give the best possible seven-term approximation, and so on. These special,ordered, orthonormal functions are called the properorthogonal nodes for the function z(x, t). With these func-tions, the expression in eq. (1) is called the POD of z(x, t).

3. Theory: Finite-dimensional case

Consider a system where we take measurements of m statevariables (these could be from m strain gauges on a struc-ture, or m velocity probes in a fluid, or a mixture of twokinds of probes in a system with flow-induced vibrations,etc.). Assume that at N instants of time, we take N sets of msimultaneous measurements at these m locations. We ar-range out data in an N × m matrix A, such that element Aij isthe measurement from the jth probe taken at the i th timeinstant. The m state variables are not assumed to be measured bytransducers that are arranged in some straight line in physi-cal space. We merely assume that the transducers havebeen numbered for identification; and that their outputshave been placed side by side in the matrix A. In the actualphysical system these measurements might represent onespatial dimension (e.g. accelerometers on a beam), or morethan one spatial dimension (e.g. pressure probes in a three-dimensional fluid flow experiment). Each physical trans-ducer may itself measure more than one scalar quantity (e.g.triaxial accelerometers); in such cases, the different scalartime series from the same physical transducer are arrangedin different columns of A. The final resultof the data collection is assumed here to be the N × m matrixA. It is common to subtract, from each column of A, themean value of that column. Whether or not this is donedoes not affect the basic calculation, though it affects the

interpretation of the results (see section 3.5 below).

3.1 The singular value decomposition

We now compute the singular value decomposition (SVD)of the matrix A, which is of the form (for a discussion of theSVD, see ref. 8):

,TVUA Σ= (3)

where U is an N × N orthogonal matrix, V is an m × m or-thogonal matrix, the superscript T indicates matrix trans-pose, and Σ is an N × m matrix with all elements zero exceptalong the diagonal. The diagonal elements Σii consist ofr = min(N, m) nonnegative numbers σi, which are arrangedin decreasing order, i.e. σ1 ≥ σ2 ≥ . . . ≥ σr ≥ 0. The σ’s arecalled the singular values of A (and also of AT) and areunique. The rank of A equals the number of nonzero singu-lar values it has. In the presence of noise, the number ofsingular values larger than some suitably small fraction ofthe largest singular value might be taken as the ‘numericalrank’. Since the singular values are arranged in a specificorder, the index k of the k th singular value will be called thesingular value number (see Figures 1 e and 2 c).

3.2 Correspondence with eqs (1) and (2)

In eq. (3), let UΣ = Q. Then the matrix Q is N × m, andA = QVT. Letting qk be the k th column of Q and vk be the k thcolumn of V, we write out the matrix product as

∑=

==m

k

Tkk

T vqQVA1

. (4)

Equation (4) is the discrete form of eq. (1). The functionz(x, t) is represented here by the matrix A. The func-tion ak(t) is represented by the column matrix qk. The func-tion φk(x) is represented by the row matrix vk

T . The approxi-mation of eq. (1) is now exact because the dimension isfinite. Due to the orthonormality of the columns of V, eq. (2)corresponds to multiplying eq. (4) by one of the v’s on theright.

3.3 Lower-rank approximations to A

For any k < r, the matrix Σk obtained by settingσk+1 = σk+2 = . . . = σr = 0 in Σ can be used to calculate anoptimal rank k approximation (see note 2) to A, given by

.Tkk VUA Σ= (5)

In computations, one would actually replace U and V withthe matrices of their first k columns; and replace Σk by its

Page 3: Proper Orthogonal Decomposition

SPECIAL SECTION: COMPUTATIONAL SCIENCE

CURRENT SCIENCE, VOL. 78, NO. 7, 10 APRIL 2000810

leading k × k principal minor, the submatrix consisting ofΣ’s first k rows and first k columns (see the computer codein Appendix A.1). The optimality of the approximation in eq. (5) lies in thefact that no other rank k matrix can be closer to A in theFrobenius norm (square root of the sums of squares of allthe elements), which is a discrete version of the L2 norm; orin the 2-norm (the 2-norm of a matrix is its largestsingular value). Thus, the first k columns of the matrix V (forany k) give an optimal orthonormal basis for approximatingthe data. Note that V is determined once and for all: the rankk of the approximating matrix can bechosen afterwards, and arbitrarily, with guaranteed optimal-

ity for each k. The columns of V are the proper orthogonal modes.

3.4 SVD vs eigenvalue decomposition

Consider the differences between the SVD and eigenvaluedecomposition. The SVD can be computed for non-squarematrices, while the eigenvalue decomposition is onlydefined for square matrices; the SVD remains within realarithmetic whenever A is real, while eigenvalues andeigenvectors of unsymmetric real matrices can be complex;the left and right singular vectors (columns of U and of Vrespectively) are each orthogonal, while eigenvectors of

Figure 1. Approximation of surface example.

c d

ba

e f

Page 4: Proper Orthogonal Decomposition

SPECIAL SECTION: COMPUTATIONAL SCIENCE

CURRENT SCIENCE, VOL. 78, NO. 7, 10 APRIL 2000 811

unsymmetric matrices need not be orthogonal even when afull set exists; and finally, while an eigenvector ψ (say) andits image Aψ are in the same direction, a right-singular vec-tor vk (k th column of V) and its image Avk need not be in thesame direction or even in spaces of the same dimension. However, the SVD does have strong connections withthe eigenvalue decomposition. On premultiplying eq. (3)with its transpose and noting that V–1 = VT, we see that V isthe matrix of eigenvectors of the symmetric m × m ATA matrixATA, while the squares of the singular values are ther = min(N, m) largest eigenvalues (see note 3) of ATA. Similarly, on premultiplying the transposed eq. (3) withitself and noting that U–1 = UT, we see that U is the matrix ofeigenvectors of the symmetric N × N matrix AAT, and thesquares of the singular values are the r = min(N, m) largesteigenvalues of AAT. If A is symmetric and positive definite, then its eigenval-ues are also its singular values, and U = V. If A is symmetricwith some eigenvalues negative (they will be real, becauseA is symmetric), then the singular values are the magnitudesof the eigenvalues, and U and V are the same up to multipli-cation by minus one for the columns corresponding tonegative eigenvalues.

3.5 Geometric interpretations

The SVD of a matrix has a nice geometric interpretation. AnN × m matrix A is a linear operator that maps vectors from anm-dimensional space, say S1, to an N-dimensional space,say S2. Imagine the unit sphere in S1, the set of vectors ofunit magnitude (square root of sum of squares of elements).This unit sphere gets mapped to an ellipsoid in S2. The sin-gular values σ1, σ2, . . . are the lengths of the principal radiiof that ellipsoid. The directions of these principal radii aregiven by the columns of U. The pre-images of these princi-pal radii are the columns of V. A second geometric interpretation may be more illumina-tive for POD applications. We now view the N × m matrix Aas a list of the coordinates of N points in anm-dimensional space. For any k ≤ m, we seek a k-dimen-sional subspace for which the mean square distance of thepoints, from the subspace, is minimized. A basis for thissubspace is given by the first k columns of V. Recall that it is common in POD applications to subtractfrom each column of A the mean of that column. This mean-shift ensures that the N-point ‘cloud’ is centered around theorigin. Figure 3 shows how the one-dimensional optimalsubspace basis vector, indicated in each case by a greyarrow, depends on where the point cloud is centered (by

Figure 2. Vibrioimpact example results.

c d

ba

Page 5: Proper Orthogonal Decomposition

SPECIAL SECTION: COMPUTATIONAL SCIENCE

CURRENT SCIENCE, VOL. 78, NO. 7, 10 APRIL 2000812

definition, a subspace must pass through the origin).

3.6 Computations

The matrix V can be found by computing the SVDdirectly using commercial software like Matlab. Alterna-tively, the calculation can be indirectly carried out usingeigenvalue/eigenvector routines, using ATA as mentioned insection 3.4. If m >> N, it is more efficient to first compute thematrix U as the matrix of eigenvectors of AAT. This methodis called the ‘method of snapshots’ in the POD literature.For example, in trying to find low-rank representations of adigital image that evolves over time, one might have a1000 × 1000 pixel image (m = 106), but only on the order ofN ≈ 103 images; in such cases, the method of snapshotsmight be used. Once U is known, premultiplying eq. (3) byUT gives

.TT VAU Σ= (6)

The product in eq. (6) is obviously still N × m. The last m – N columns of Σ are zero; the bottom m – N rows of VT, orthe last m – N columns of V, are multiplied byzeros and are indeterminate and irrelevant. If there are knonzero singular values, then the first k rows of ΣVT arenonzero and orthogonal. Their norms are the singularvalues. Normalizing them to unit magnitude gives the corre-sponding proper orthogonal modes vi.

4. Theory: Infinite-dimensional versions

For most users of the POD, the simpler theory for the dis-crete case suffices. Experimental data are always discrete,and in any case integral equations (which arise in theinfinite-dimensional case) are usually solved numerically bydiscretization. However, those requiring the infinite-dimensional versions may wish to consult, e.g. Holmes,

Lumley and Berkooz2 as well as a text on integral equations,such as Porter and Stirling9. For completeness, the mainpoints are outlined here. Infinite-dimensional PODs are solved as eigenvalueproblems. The issue to resolve is what ATA should mean forinfinite-dimensional problems. To this end, note that in the finite-dimensional versiondiscussed in preceding subsections, element (i, j) of thematrix B = ATA is

∑=

=N

kkkij ji

AAB1

, (7)

with the sum being carried out on the ‘time’ variable or therow-index of A. In the finite-dimensional-in-space but continuous-timeversion of the POD, the vector inner products involved incomputing ATA become integrals. We simply take

,d),(),(1 0

0ttxztxz

TB

Tt

t jiij ∫+

= (8)

where for any finite T the correspondence between the inte-gral (eq. (8)) and the sum (eq. (7)) is clear. Since Bis still just a matrix, it still has only a finite number of eigen-values. In the discrete case, with finite data, the factor of 1/T canbe ignored since it affects only the absolute values of theσ’s and leaves their relative magnitudes unchanged; thematrices U and V are unaffected as well. Before going on to the fully infinite-dimensional case, webriefly consider the significance of the averaging time dura-tion T in eq. (8). In applications of the continuous-time PODto steady state problems, one usually assumes that in eq.(8) a well-defined limit is attained, independent of t0, as Tapproaches infinity. In practical terms this means that if thePOD is being used to obtain a low-dimensional descriptionof the long-term or steady state behaviour, then the datashould be collected over a time period much longer than thetime scale of the inherent dynamics of the system. However,applications of the POD need not only be to steady stateproblems. In studying the impulse response, say, of a com-plicated structure, the steady state solution may be the zerosolution. However, a finite-time POD may well yield usefulinsights into the transient behaviour of the structure. As was the case with subtracting vs not subtracting themeans from the columns of A, whether or not the data col-lection time T was long enough to accurately capture thesteady state behaviour does not affect the basic calculationbut affects the interpretation of results. We now consider the fully infinite-dimensional version ofthe POD. Now the rows of A are replaced by functions ofspace, and there is an infinite sequence of eigenvalues (withassociated eigenfunctions). B is not a matrix anymore, but afunction of two variables (say x1 and x2).

Figure 3. Effect of shifting the mean of the point cloud to theorigin.

Page 6: Proper Orthogonal Decomposition

SPECIAL SECTION: COMPUTATIONAL SCIENCE

CURRENT SCIENCE, VOL. 78, NO. 7, 10 APRIL 2000 813

,d),(),(1

),( 0

02121 ∫

+=

Tt

tttxztxz

TxxB (9)

where x1 and x2 are continuous variables both defined onsome domain X. How should we interpret the eigenvalues ofB as given by eq. (9)? The eigenvalue problem in the dis-crete m × m case, Bψ = λψ, can be written out in componentform as

∑=

=m

jijijB

1

.λψψ (10)

For the B of eq. (9), the sum of eq. (10) becomes an integral,ψ becomes a function of x, and we have the integral equa-tion

).(d)(),( 12221 xxxxxB λψψ =∫xIt is clear that the above integral equation has the same formwhether the space variable x is scalar valued or vector val-ued.

5. Numerical examples

5.1 Approximation of a surface

Let z be given by

.20,10),sin(e),( |)1)(5.0(| ≤≤≤≤+= −−− txxttxz tx

(11)

Imagine that we ‘measure’ this function at 25 equallyspaced x points, and 50 equally spaced instants of t. Thesurface z(x, t) is shown in Figure 1 a. Arranging the data in a matrix Z, we compute the SVD ofZ, and then compute (see section 3.3) rank 1, rank 2, andrank 3 approximations to Z, as shown in Figure 1 b–d, Thecomputer code used for this calculation is provided in theappendix. (The means were not subtracted from the columnsfor this example.) The rank 3 approximation (Figure 1 d) looks indistin-guishable from the actual surface (Figure 1 a). This is ex-plained by Figure 1 e, which shows the singular values of Z.Note how the singular values decrease rapidly in magni-tude, with the fourth one significantly smaller than the third.(The numerical values are 47.5653, 2.0633, 2.0256, 0.0413,0.0106, . . . .) Note that in this example without noise, the computedsingular values beyond number 14 flatten out at the nume-rical roundoff floor around 10–15. The actual singularvalues beyond number 14 should be smaller, and an identi-cal computation with more digests of precision should showthe computed singular values flattening out at a smallermagnitude. Conversely, perturbing the data matrix by zero-mean random numbers of typical magnitude (say) 10–8,

causes the graph of singular values to develop an obviouselbow at about that value. For experimental data with noise,the SVD of the data matrix can sometimes provide an empiri-cal estimate of where the noise floor is. So far in this example, we have merely computed lower-rank approximations to the data, and the use of the SVD inthe calculation may be considered incidental. Now, supposewe wish to interpret the results in terms of mode shapes, i.e.in the context of the POD. The first 3 columns of V providethe 3 dominant x-direction mode shapes, and on projectingthe data onto these mode shapes we can obtain the timehistories of the correspondingmodal ‘coordinates’. The calculation of the modal coordinates is straightfor-ward. Using eq. (4), the k th modal coordinate qk is simplyukσk, where uk is the k th column of U (assuming U is avail-able from the SVD). Alternatively, if only the proper or-thogonal modes V are available, then the projectioncalculation is simply qk = Avk, where vk is the k th column ofV. The modal coordinates for the surface given by eq. (11)are plotted in Figure 1 f. The first coordinate is obviouslydominant (the first singular value is dominant), while thesecond and third have comparable magnitude (singular val-ues 2 and 3 are approximately equal).

5.2 Proper orthogonal modes in a simplifiedvibroimpact problem

Let us consider the one-dimensional, discrete systemshown in Figure 4. Ten identical masses m are connected toeach other (and to a wall at one end) by identical linearsprings of stiffness k and identical dashpots of coefficientc. The forcing on each mass is identical with

.sin)(...)()( 1021 tAtFtFtF ω====

The fifth mass has a hard stop which intermittently contactsa nonlinear spring; the spring force is Kx3 when the dis-placement x of mass 5 is negative, and zero otherwise. The equations of motion for this system are easy to write,and are not reproduced here. The system wasnumerically simulated using ode23, a Matlab routineimplementing a low-order Runge–Kutta method with adap-tive step size control. The program’s default error toleranceswere used (10–3 for relative error and 10–6 for absolute error).The solution, though adaptively refined using internal errorestimates, was finally provided by the routine at equally-spaced points in time (this is a convenient feature of thisroutine). The simulation was from t = 500 to t = 6000 equalintervals. For initial conditions, all ten displacements andvelocities were taken to be zero. In the simulation, the on/offswitchings of the cubic spring were not monitored, sincethe nonlinearity is three times differentiable, the routineuses a low-order method, there is adaptive error control, and

Page 7: Proper Orthogonal Decomposition

SPECIAL SECTION: COMPUTATIONAL SCIENCE

CURRENT SCIENCE, VOL. 78, NO. 7, 10 APRIL 2000814

overall speed of computation was not a concern in thissimulation. The parameters used in the simulation were:

.5and,2.0,08.0,3.0,1,1 ====== kAckm ω

Figure 2 shows the results of the simulation. Figure 2 ashows the displacement vs time of mass 4, which is seensettling to a periodic motion. In Figure 2 b, the displace-ments vs time of masses 2 and 5 are shown over a shortertime period. The ‘impacts’ of mass 5 are clearly seen. Figure2 c depicts the singular values of the matrix ofpositions (velocities are not used). Note that the mean dis-placements of the masses were nonzero because of the one-sided spring at mass 5, and in the computations the meandisplacements were subtracted from the columns of the datamatrix (see section 3.5). Finally, the first three proper or-thogonal modes are shown in Figure 2 d. The relative mag-nitudes of the maximum displacements of masses 2 and 5, inFigure 2 b, are consistent with the relative magnitudes ofthe corresponding elements of the first, dominant, properorthogonal mode shown in Figure 2 d. Note that in Figure2 d the three mode shapes have comparable magnitudes.This is as it should be, because the mode shapes them-selves are all normalized to unit magnitude; it is time-varying coefficient of the third mode, say, which is muchsmaller than that of the first mode. The computer code used for these calculations (to proc-ess the previously computed displacement vs time data) isgiven in the appendix. From Figure 2 c and d, we observe that the first two sin-gular values are together much larger than the rest; as such,the response is dominated by the first two modes. However,the effect of the nonsmooth impacting behaviour shows upmore clearly in the third mode (Figure 2 d ), although it has arelatively small overall amplitude. This is consistent withFigure 2 b, where it is seen that the displacements at a ‘typi-cal’ location (mass 2) are significantly smoother than thoseat the impact location (mass 5). The displacement of mass 5itself is slow for relatively long portions of time, and hassome localized high-frequency, small-amplitude behaviournear the impacts. Roughly speaking, the softness of theimpact contact(cubic spring) as well as the damping in the structure lead tosmooth-looking mode shapes capturing most of the action

most of the time. In other words, the ‘fast’ dynamics intro-duced by the impacts is localized in space (meaning it issignificant only at and near the impactlocation), and also localized in time (meaning it is significantfor only a short period following each impact); and in anoverall average sense, the amplitude of the high-speed, im-pact-induced oscillations is small. For this reason, themodes that dominate the POD do not show the impact be-haviour clearly. The foregoing example provides a glimpse of how thePOD may be used to extract qualitative and quantitativeinformation about the spatial structure of a system’s beha-viour; and how a posteriori data analysis might lead to animproved understanding of the system.

6. Some words of caution

6.1 Sensitivity to coordinate changes

The POD is sensitive to coordinate changes: it processesnumbers, whose physical meaning it does not know. TheSVD of a matrix A is not the same as the SVD of anothermatrix AB, where B is an invertible n × n matrix (corres-ponding to talking linear changes of variables). In particular,inappropriate scaling of the variables beingmeasured (easily possible in systems with mixed measure-ments such as accelerations, displacements and strain) canlead to misleading if not meaningless results. In the experimental study of chaotic systems one some-times uses delay coordinate embedding (see, e.g. Ott10);which implicitly involves a strongly nonlinear change ofvariables. Readers may like Ravindra’s brief note11 discuss-ing one inappropriate use of the POD in this connection.

6.2 Subspaces vs surfaces

The analysis is based on linear combinations of basis func-tions. The POD cannot distinguish between a featurelesscloud of points on some plane, and a circle on the sameplane. While the circle obviously has a one-dimensionaldescriptions, on blindly using the POD one might concludethat the system had two dominant ‘modes’.

6.3 Rank vs information content

Figure 4. Simple model of a vibrioimpact system.

Page 8: Proper Orthogonal Decomposition

SPECIAL SECTION: COMPUTATIONAL SCIENCE

CURRENT SCIENCE, VOL. 78, NO. 7, 10 APRIL 2000 815

As the SVD can be used to compute low-rank approxima-tions to data matrices, it is tempting to think that rank isrelated to information content. To see that this is not nece-ssarily true, consider an n × n diagonal matrix (rank n) withanother n × n matrix which has the same nonzero entries butin a single column (rank one). Though the ranks differwidely, the information contents of these two matrices mayreasonably be called identical. Incidentally, this diagonal matrix vs single-nonzero-column matrix example has a nice correspondence with twotypes of physical phenomena: the diagonal matrix corre-sponds to travelling disturbances (e.g. pulses), while thesingle-nonzero-column matrix corresponds to disturbancesthat stay localized (e.g. standing waves). Readers are in-vited to mimic the surface-approximation example of section5.1 with the function z(x, t) = e–a(x – t)2, which represents atravelling hump, with (say) a = 5, – 2 ≤ x ≤ 2, – 2 ≤ t ≤ 2, andwith (say) a grid of 80 points along each axis. In this exam-ple (whose details are not presented here for reasons ofspace), it is found that the largest several singular valueshave comparable magnitudes; and low-rank approximationsto the surface are poor. For increasing a, the approximationat any fixed less-than-full rank gets poorer. This exampleshows that some physical systems may be poorly suited tonaïve analysis using the POD.

6.4 Modal ‘energies’

The eigenvalues of ATA (or the squares of the singular val-ues of A) are sometimes referred to as ‘energies’ corres-ponding to the proper orthogonal modes. In signalprocessing, this energy is not a physical energy. In incom-pressible fluid mechanics with velocity measurements, thisenergy is related to the fluid’s kinetic energy. However, instructural dynamics problems with, say, displacementand/or velocity measurements, there is generally no directcorrespondence between this energy and either the sys-tem’s kinetic energy or its potential energy or any combina-tion thereof. Thinking of the eigenvalues of ATA as‘energies’ in a general mechanical context is incorrect in

principle and may yield misleading results. For example, consider a two-mass system. Let one massbe 10–4 kg and let it vibrate sinusoidally at a frequency ωwith an amplitude of 1 m. Let the second mass be 104 kg,and let it vibrate sinusoidally at a frequency 2ω with an am-plitude of 10–2 m. Then the first proper orthogonal modecorresponds to motion of the first mass only, while the sec-ond proper orthogonal mode corresponds to motion of thesecond mass only. The modal ‘energy’ in the first properorthogonal mode is 104 times larger than that in the secondmode. However, the actual average kinetic energy of thefirst mass (first mode) is 104 times smaller than that of thesecond mass (second mode). (Readers who like this examplemay also enjoy the discussion in Feeny and Kappagantu4.)

6.5 Proper orthogonal modes and reduced-ordermodelling

Proper orthogonal modes are frequently used in Galerkinprojections to obtain low-dimensional models of high-dimensional systems. In many cases, reasonable to excellentmodels are obtained. However, the optimality of the PODlies in a posteriori data reconstruction, and there are noguarantees (as far as I know) of optimality in modelling.Examples can be constructed, say along the lines of thesystem considered in section 6.4 above, where the PODprovides misleading results that lead to poor models. Asanother example, a physical system where a localized dis-turbance travels back and forth is poorly suited to analysisusing the POD; and while such system might in fact have auseful low-dimensional description, the POD may fail to findit because that description does not match the form of eq.(1). The previous warnings notwithstanding, through a com-bination of engineering judgement and luck, the POD con-tinues to be fruitfully applied in a variety of engineering andscientific fields. It is, in my opinion, a useful tool at least forpeople who regularly deal with moderate to high-dimensional data.

Appendix – Code for numerical examples

All computations presented in this paper were carried out using the commercial software Matlab. Some of the codes used areprovided below.

A.1 From section 5.1

Page 9: Proper Orthogonal Decomposition

SPECIAL SECTION: COMPUTATIONAL SCIENCE

CURRENT SCIENCE, VOL. 78, NO. 7, 10 APRIL 2000816

A.2 From section 5.2

Page 10: Proper Orthogonal Decomposition

SPECIAL SECTION: COMPUTATIONAL SCIENCE

CURRENT SCIENCE, VOL. 78, NO. 7, 10 APRIL 2000 817

Notes

1. We assume these functions are bounded and integrable. In ex-periments, measurements are bounded and discrete; integration isequivalent to summation; and the subtleties of integration cansafely be ignored. Here, I stay away from integration theory.The interested reader may consult, e.g. Rudin7.

2. Strictly speaking, one should say ‘rank at most k’. 3. If m > N, then r = N; there are m eigenvalues but only N singular

values; the largest N eigenvalues equal the squares of the N sin-gular values; and the smallest m – N eigenvalues are all exactlyzero. If m ≤ N, then r = m ; and the m eigenvalues equal thesquares of the m singular values. (Readers may wish to work outthe details of these calculations using a numerical example oftheir choice.)

1. Kosambi, D. D., J. Indian Math. Soc., 1943, 7, 76–88. 2. Holmes, P., Lumley, J. L. and Berkooz, G., Turbulence, Coher-

ent Structures, Dynamical Systems and Symmetry, CambridgeMonogr. Mech., Cambridge University Press, 1996.

3. Cusumano, J. P., Sharkady, M. T. and Kimble, B. W., Philos.Trans. R. Soc. London, Ser. A, 1994, 347, 421–438.

4. Feeny, B. F. and Kappagantu, R., J. Sound Vibr., 1998, 211,607–616.

5. Koditschek, D., Schwind, W., Garcia, M. and Full, R., 1999(manuscript under preparation).

6. Ruotolo, R. and Surace, C., J. Sound Vibr., 1999, 226, 425–439.

7. Rudin, W., Principles of Mathematical Analysis, InternationalSeries in Pure and Applied Mathematics, McGraw-Hill, 1976, 3rdedn.

8. Golub, G. H. and Van Loan, C. F., Matrix Computations, JohnsHopkins University Press, Baltimore, 1990, 2nd edn.

9. Porter, D. and Stirling, D. S. G., Integral Equations: A PracticalTreatment, From Spectral Theory to Applications, CambridgeTexts in Applied Mathematics, Cambridge University Press,1990.

10. Ott, E., Chaos in Dynamical Systems, Cambridge UniversityPress, 1993.

11. Ravindra, B., J. Sound Vibr., 1999, 219, 189–192.