j1.1 ozone profile retrieval from gome data using a …

J1.1 .

OZONE PROFILE RETRIEVAL FROM GOME DATAUSING A NEURAL NETWORK INVERSE MODEL

Martin D. Müller∗, Anton K. KaifelCenter for Solar Energy and Hydrogen Research (ZSW), Stuttgart, Germany

ABSTRACTIn order to retrieve the atmospheric ozone distribution from the UV-visible satellite spectrometer GOME(Global Ozone Monitoring Experiment), we have modelled inverse radiative transfer directly, using a multi-layer perceptron (MLP) type neural network. This network was trained on a data set of measured GOMEradiances as input, and collocated ozone profile measurements from ozonesondes, SAGE II, HALOE andPOAM III as target values. A partial training method has been devised for dealing with incomplete tar-get data, because neither occultation instruments nor ozone sondes cover the full retrieval height range(1–60 km). Around 70000 collocations from 1996 to 2001 were used for training, and another 12000 forcross-validation in a test data set. Network input consists of a combination of spectral, geolocation andclimatological information (latitude and time), the latter making the use of external a priori ozone profilesunnecessary. We designate our method as Neural Network Ozone Retrieval System (NNORSY). In thestratosphere, NNORSY globally reduces standard deviation (StD) with respect to a well-tried ozone climatol-ogy by around 40%. Tropospheric ozone can also be retrieved in many cases, reducing the StD by 10-30%globally. The neural network was found capable of correcting for instrument degradation, pixel cloudinessand scan angle effects on its own, based on the input data provided. Remaining inhomogeneities in the ge-ographical distribution of training data, combined with differing ozone field variability causes retrieval qualityto vary with latitude and season. Since retrieval only requires one forward propagation through the network,NNORSY is about 103–105 times faster than classical, local retrieval techniques like, for instance, OptimalEstimation. Therefore the method is well suited for real-time application and fast data reprocessing. Anoperational near-real-time prototype of the system is already running successfully at three GOME receivingstations. In order to better characterize single output ozone profiles, a number of local error estimationmethods has been investigated. Vertical resolution of the profiles was assessed empirically in compar-isons with the high-resolution collocations, and seems to be in the order of 4–6 km. Further developmentsof NNORSY could involve improvement of training data composition, input parameter optimization, moresophisticated network error functions and training methods, as well as adaptation to other sensors.

1. INTRODUCTION

Although it contributes only about one millionth tothe total mass of the atmosphere, the trace gasozone is one of its most active and important con-stituents. Not only does it prevent harmful UV ra-diation from reaching the ground, but it controls thestratospheric temperature distribution, and sharesresponsibility for summer smog, the oxidation ca-pacity of the atmosphere and global warming. De-termining the three-dimensional, global distributionof ozone from the orbit has been a challenge towhich a number of different satellite instrumentshave been assigned in the past. One of these isthe Global Ozone Monitoring Experiment (GOME)(Burrows et al., 1999) upon the European polar or-biting satellite ERS-2. It measures the backscat-tered solar radiation from 240 nm to 790 nm at a

∗Corresponding author address: Martin D. Müller, ZSW,Hessbrühlstr. 21c, 70565 Stuttgart, Germany; e-mail:[email protected]

moderate spectral resolution (0.2–0.4 nm). GOMEis a scaled-down version of the SCIAMACHY instru-ment (SCanning Imaging Absorption spectroMeterfor Atmospheric CHartographY ) (Burrows et al.,1995; Bovesmann et al., 1999), which was recentlyput into operation on the European ENVISAT satel-lite.

To exploit the theoretical ozone information con-tent of the backscattered UV-vis radiation, a non-linear, ill-posed inverse retrieval problem has to besolved for the sun-normalized GOME radiances.To achieve this, various independent retrieval algo-rithms have been developed, most of which basedon the principle of Optimal Estimation (OE) as de-scribed by Rodgers (1976, 1990). Simply put, thismethod tries to find an optimal compromise be-tween information gained from a spectral measure-ment and an a priori ozone profile. The compromiseis based on the reliability of either source, i. e. theirerror characteristics.

Athough this retrieval scheme can be considered

1

geophysical, since it relies on the relatively well un-derstood forward calculation of the radiative transferequation (RTE) used for extracting spectral informa-tion, it also possesses a significant statistical com-ponent in the form of the optimization framework.In addition, solving the RTE requires certain as-sumptions about unmeasurable or unavailable pa-rameters of the state of the atmosphere and the in-strument. Also, the a priori ozone profile togetherwith its variance is usually taken from a climatol-ogy, which also consists of a statistical and empiri-cal analysis of a number of measurements.

This paper proposes to let a neural networkscheme take care of the heuristics and statisticsof the retrieval process, after being provided witha suitable structure and training data set derivedfrom geophysical knowledge and a few assump-tions. Such has been done successfully in otherfields of atmospheric science (Krasnopolsky, 1997;Chevallier et al., 1998; Jiménez, 2000; Aires et al.,2001) with the network tool of choice being a nonlin-ear feed-forward neural network, also called a mul-tilayer perceptron (MLP) (Rumelhart et al., 1986).Based upon experience in MLP ozone retrieval fromNOAA-TOVS data (Müller and Kaifel, 1999; Kaifeland Müller, 2001), the method has now been ap-plied to GOME data and will be designated in whatfollows as Neural Network Ozone Retrieval System(NNORSY). The motivation behind its developmentis therefore, that it will complement existing retrievalalgorithms by exhibiting different problems and er-ror characteristics, and – even more importantly –that it is faster by several orders of magnitude.

Section 2 of this paper will give a brief overviewof the neural networks’ relationship to the well-established OE theory, and describe the specificdata and training methods needed for retrieval ofozone profiles from GOME. Section 3 presents theretrieval results in the form of error statistics onthe training and test data sets, as well as subsetsthereof. A brief case study with validation data fromHohenpeissenberg ozone sondes is also included.Section 4 will then elaborate on the advanced top-ics of estimating local retrieval errors and verticalresolution, while Section 5 will conclude the paper.

2. DATA AND METHODS

2.1. MLPs in the Retrieval Context

The idea of retrieving height-resolved ozone infor-mation with an orbital instrument by measuring the

backscattered UV (BUV) radiation dates back to the1950’s (Singer and Wentworth, 1957) and will bebriefly described here. While theoretically possi-ble for other trace gases as well (Rozanov et al.,1993), ozone is particularly suited for this task,because its absorption coefficient rises steeply byabout four orders of magnitude in the UV. This alonefor solar radiation would already lead to decreas-ing penetration depth in the atmosphere, but theeffect is greatly enhanced by Rayleigh-scatteringand increasing air pressure. As a result, incom-ing radiation observes an almost transparent atmo-sphere down to a certain depth, at which extinctionrises rapidly within a few kilometers. The height ofthis layer depends on wavelength; it almost actslike a fuzzy mirror, with the reflected photons car-rying ozone information mostly from the particu-lar height region. By scanning the UV, a height-resolved ozone profile can thus be constructed.

From radiative transfer theory, it can be shown(e.g. Rodgers, 1990) that the top-of-atmosphere ra-diance y(λ) measured by a satellite is essentially aweighted integral over the atmospheric state x(z)

y(λ) =∫ ∞

z=0K(z,λ)x(z)dz, (1)

where z is the height coordinate, λ the wave-length and K(z,λ) the weighting function or kernelof the observation process, which decribes the alti-tude and fuzziness of the abovementioned ’mirror’.For discrete measurements, Eq. 1 is discretized insuch a way that x = (x1,x2, . . . ,xn)T is a vector inn-dimensional state space, and y = (y1,y2, . . . ,ym)T

belongs to the m-dimensional measurement space.In retrieval theory, it is furthermore assumed forphysical reasons that the relationship between xand y can be modeled as

y = F(x,b)+ ε, (2)

where F is a (radiative transfer) forward modeldepending on the atmospheric state x and a num-ber of additional parameters which are for simplic-ity combined into one vector b. The measurementnoise vector ε is assumed to have Gaussian statis-tics. In general, the model F will not be linear, butsince it represents a continuous function, it can belinearized in the vicinity of a certain state x0, suchthat

y−F(x0) =∂F∂x

∣∣x0(x−x0)+O(x2)≈ K(x−x0), (3)

2

where K = ∂F∂x is called the kernel matrix. How-

ever, the underlying Eq. 1 belongs to the class ofFredholm integrals of the first kind, a class contain-ing many ill-posed problems, of which the inverseretrieval problem is no exception. For a compre-hensive discussion on this topic, the reader is re-ferred to (Rodgers, 2000). It shall just be statedhere that the rank of K is usually smaller than thestate space dimension n, therefore the problem isunderconstrained, meaning there exist a multitudeof possible solutions x for any given observationy. It is therefore reasonable to look at the prob-lem from a Bayesian point of view: The most prob-able solution x is the one that maximizes the like-lihood L = p(x|y), with p(x|y) denoting the condi-tional probability density of x given the observationy. This is equivalent to minimizing an error function

E = − lnL =− ln p(x|y) =− ln

(p(y|x)p(x)

p(y)

)= (y−Kx)TS−1

e (y−Kx)++(x−xa)TS−1

a (x−xa)+const., (4)

where Bayes’ Theorem was used in the first lineand the assumption of Gaussian errors in the sec-ond. p(y) is assumed to be constant. The covari-ance matrix Se contains forward model and mea-surement errors, while Sa tries to model the natu-ral variability of the state, centered around some apriori xa, which in practice mostly represents a cli-matological mean, i. e. xa = xa(θ,d) with θ denotinggeographical latitude and d the time of year. Eq. 4constitutes the core of the OE framework (Rodgers,1976, 2000). As can be seen, it applies for a singleobservation y, and requires the knowledge of a for-ward model, its error covariance, the measurementerror covariance, an a priori state and an a prioricovariance.

Since we assumed Gaussian distribution for allerrors involved, it follows that p(x|y) is also a Gaus-sian. This function can be sampled using a setof observations {yp}, p∈ {1, . . . ,P}, for which thecorresponding states xp are known. We further as-sume the existence of an inverse function R mod-elled by

x = R(y,c,w)+ εR, (5)

where c is a vector of additional input parame-ters, and w contains the inverse model parameters.Omitting the constant terms, the error function canthen be written as

E =− lnP

∏p=1

p(xp|yp) =

=P

∑p=1

(R(yp,cp,w)−xp)T S−1(R(yp,cp,w)−xp)2 (6)

In our case, R is realized by means of a simulatedMLP-type neural network. This network consistsof a one-dimensional input layer containing enoughneurons to receive the input y and c. Each inputneuron is connected to all nodes of a second, hid-den layer of neurons via synapses carrying weightsw. When presented with an input data vector, theinput signals propagate along the synapses whilstbeing multiplied by the weights. The hidden neu-rons essentially sum up all incoming signals, anduse a fixed nonlinear function – in our case tanh– to in turn define their outputs, which propagatethrough another layer of weighted synapses to theoutput neurons. These neurons again sum up theirinputs, and use another tanh transfer function and asuitable renormalization to define the network out-put x. Thus, the network is essentially a mappingfrom measurement space to state space parame-terized by means of the weights w.

In principle, more than one hidden layer of neu-rons could be used, but is has been proven thatone layer is enough to make the network model auniversal approximator, i. e. it can theoretically ap-proximate any given mapping with arbitrary accu-racy (Hornik et al., 1989). However, this proof isnot a constructive one, therefore the hidden layersize has to be determined empirically (Tamura andTateishi, 1997), and the optimal set of weights w issearched for with a learning algorithm, the choiceof which will be described in Section 2.3.. In con-trast to conventional approximation schemes withfixed basis functions (e. g. polynoms, trigonometicfunctions, etc.), where the approximation error in-creases exponentially with the dimension of themapping, in approximation tasks carried out bymeans of neural networks the error was shownto be independent of the dimension (Baron, 1993,1994).

At this point, we should note the fundamental dif-ferences between OE and the neural network ap-proach: While the first term in Eq. 4 represents adistance in measurement space, Eq. 6 correspondsentirely to state space. Also, the neural networkmethod doesn’t demand explicit knowledge of a for-ward model or an a priori, but instead it needs atraining data set of paired observations and mea-surements {(yp,cp),xp}, from which it estimates an

3

optimal set of parameters w. Once this is done, theresulting mapping R is optimal in a global sense,and can be applied to all observations, whereas OEdetermines a new, locally optimal solution for eachindividual observation. The consequences of thesedifferences will be discussed further below.

In practice, Eq. 6 is often simplified for computa-tional reasons by assuming S= σ2In for all trainingpairs.∗ It then doesn’t influence the minimization ofE anymore, therefore it can be safely omitted, whichleads to the well-established quadratic error func-tion of the form

E =12

P

∑p=1

(R(yp,cp,w)−xp)2 =:P

∑p=1

Ep, (7)

where the factor 12 is just a convention, and Ep is

the error attributed to a single output pattern. Thisrather coarse assumption is still adequate for ozoneretrieval purposes, as will be demonstrated below.However, the description of retrieval error through asingle standard deviation σ is unsatisfactory in mostcases, thus supplementary methods have to be em-ployed to better characterize the errors (Sec. 4).

Since the ill-posedness of the inverse retrievalproblem leads to very unstable solutions witha straightforward, classical approach (Hasekampet al., 1999; Rodgers, 2000), the second term inEq. 4 serves as a regularizer. Another way to looka this is by means of the a priori, the OE algorithmis drawn toward a physically reasonable solution ofEq. 1, while purely mathematical ones (e. g. involv-ing negative ozone densities), and climatologicallyimprobable ones are being made unattractive. Onthe other hand, neural networks with quadratic errorfunctions always approximate the conditional meanof the training data with respect to the input Bishop(1995a). In the case of ambiguities, this may lead tobad results, since for instance the mean of two validbranches of a solution is not necessarily itself a so-lution – it may well lie on neither branch. Howeverit can be argued that since OE yields unambiguousresults, the parameters b used in the forward model,in addition to θ and d from the a priori serve to re-move the ambiguities. Here, these parameters arecombined into the vector c, and are thus includedin the neural network input to render the networkmapping unambiguous.

∗Suitable normalization of y within the preprocessing step en-sures that the output variances are all in the same order of mag-nitude.

2.2. Training Dataset Assembly

The NNORSY method relies on collocations withmeasured ozone profiles from different sources toform the training and test databases. NNORSY wastrained on collocations dated from Jan. 1996 to July2001. The maximum distance for collocations wasset to 250 km, with no more than 12 h betweenozone and GOME measurements. All data thus col-lected were interpolated to a common height gridstating average ozone number densities at 1, 2, . . .,60 km geopotential height (GPH).

Ozonesonde data were obtained from theWorld Ozone and Ultraviolet Radiation Data Cen-ter (WOUDC) (Wardle et al., 1998) and fromthe Southern Hemisphere Additional Ozoneson-des (SHADOZ) campaign (Thompson et al., 2001).These data generally have a high quality and ver-tical resolution, but their geographical distributionis uneven, with most stations situated at northernmidlatitudes and very few measurements over theoceans. Since sonde data starts becoming unre-liable around 25–30 km (SPARC, 1998), all sondeprofiles were cut off at a random height in this range.Cutting off all sondes at the same height resultedin artifacs in the retrieved profiles and was there-fore avoided. To achieve coverage of the greaterheights, the sonde measurements were supple-mented by satellite data from solar occultation limbsounders. These were the infrared HALOE instru-ment (Russell et al., 1993), data version 19 (Luet al., 1997), the ultraviolet SAGE II sensor (Cun-nold et al., 1989), v6.10 (Cunnold et al., 2000), andthe POAM III sensor, v3 (Lucke et al., 1999; Lumpeet al., 2002). HALOE and SAGE II sample the entireglobe quite evenly between 70◦S and 70◦N, whilethe POAM measurement geometry results in sun-rise events occurring between 54◦N and 71◦N, andsunset between 63◦S and 88◦S. Note that some-times multiple GOME pixels were collocated with asingle ozone profile. This is equivalent to trainingwith noisy input data, and serves to further regular-ize the retrieval (Bishop, 1995b).

As has been found elsewhere (Lu et al., 1997;Steele and Turco, 1997; Deniel et al., 1997; SPARC,1998), the aforementioned ozone data sources donot always agree. The neural network will con-struct a compromise between the different sources,therefore biases stemming from different measure-ment principles and retrieval algorithms for the limbsounders are likely to cancel out. Yet it is clear thatstrong biases and variances in the training data setwill adversely affect NNORSY retrieval accuracy,therefore some homogenization based on the sta-

4

tistical properties of the training data set was per-formed. Since both the distribution of collocationsand of ozone in general are largely governed by lat-itude, the statistics of latitude bands were used forthis purpose.

First, the number of collocations over latitude andtime had to be checked, for overrepresentation ofa certain latitude region might lead to biases in theneural network output, by favoring a certain type ofprofile (Bishop, 1995a). On the other hand, it isreasonable to include more profiles from latitudeswhere natural ozone variability is higher – e. g. thefringe of the polar vortex and the northern midlat-itudes – such that a greater number of differentatmospheric situations is sampled. The availabil-ity of data over time should be mostly level, other-wise ozone trends and sensor degradation cannotbe accurately modelled. Removal of profiles is doneby selectively reducing the allowed maximum collo-cation distance for ground stations, times, and/orlatitude bands. The 2-dimensional homogeniza-tion over latitude and time sometimes requires sub-optimal compromises to be made. Fig. 1 shows his-tograms of observation density for NNORSY train-ing data before and after the operation.

It is clearly seen in Fig. 1, that the highest densityof collocations is located around 50◦– 60◦N. POAMdata in the NH have therefore been used only pole-ward of 65◦N, to extend data coverage towards thepole, because SAGE and HALOE measurementsbecome sparse in this region. Being of very goodquality (Lucke et al., 1999), the withheld POAM datafrom 54◦– 65◦N can later be used for validation pur-poses.

In a second step, the remaining collocations arechecked for consistency and submitted to a heuris-tic screening procedure to remove outliers. Specialcare has to be taken regarding occultation sounderdata below the lowermost stratosphere, becausetropospheric aerosols and clouds tend to interferewith their retrieval algorithms and are not alwayscorrectly accounted for in the error specificationsaccompanying the ozone products (Bhatt et al.,1999; Steele and Turco, 1997; SPARC, 1998).

The GOME data used for profile retrieval wereprocessed with the standard GOME Data Proces-sor (GDP), Version 2.1 (Slijkhuis and Loyola, 1999;Bargen et al., 1999), including all standard cor-rections (straylight, polarization, etc.) except fromdegradation, which the program applies equally tosolar and earth spectra. Thus it cancels out whencalculating sun-normalized radiances. Before nor-malization, all spectra have to be interpolated to a

common wavelength grid, since the neural networkdoes not get any wavelength information and there-fore assumes each spectral input neuron has a fixedlocation in wavelength space.

The spectral channels of the GOME instrumentand their characteristics are given in Table 1.GOME measures three forward pixels across itstrack, followed by a single backscan pixel, whichis not used here. Apart from that, all valid GOMEpixels regardless of cloudiness and ground condi-tion were retained. The GOME forward pixel size of160 x 40 km2 being rather large already, retrievalswere performed using the full horizontal resolution.This required assigning the same GOME channel1A spectra to clusters of six forward pixels, sincethe channel is integrated for a longer timespan to in-crease the S/N ratio. The channel boundary switchin mid-1998 was found to have no impact on theretrieval with NNORSY, any effects of this are obvi-ously compensated for by the network.

2.3. Network Training Considerations

The neural network used for the retrievals has 122input, 45 hidden and 60 output neurons. Table 2shows the configuration of the neural network inputlayer. GOME sun-normalized radiance values aremainly taken from the ozone Hartley- and Huggins-bands, where the absorption coefficient’s temper-ature dependence provides height-resolved ozoneinformation (Chance et al., 1997). The spectralresolution has been decreased to 0.35–1.5 nm bycoadding of 4–12 wavelengths, in order to reducerandom noise and the number of free parameters(weights) in the network. Since it was seen else-where that the ozone Chappuis band can under cer-tain conditions improve retrieval results in the tropo-sphere (de Beek, 1998), some radiances from thatband were also included. Additional spectral val-ues provide information on cloud cover and groundalbedo. It should be noted that due to the flexibilityof neural networks, the method is not very sensitivetowards the selection and resolution of wavelengthwindows, as long as they contain enough physicalinformation altogether. This was found in a num-ber of sensitivity studies (Müller, 2002) which willnot be discussed here. Suffice to say that omittingall spectral information, i. e. training climatologicalnetworks, increases retrieval errors by over 40% inthe stratosphere and above, and by 5–40% in thetroposphere (Fig. 2). Therefore, it is clear that theneural network model can indeed be used to ex-tract considerable height-resolved information fromthe GOME spectra provided.

5

-90 -60 -30 0 30 60 900

1k

2k

3k

4k

5k

6k

7k

8k

9k

20k

30k

40k

nu

mbe

r of

col

loca

tions

degrees latitude

before after

1996 1997 1998 1999 2000 20010

1k

2k

3k

4k

5k

6k

7k

8k

9k

10k

11k

12k

13k17k

18k

num

ber

of c

ollo

catio

ns

year

before after

Figure 1: Training data distribution by latitude and time, before and after the data set homogenizationprocedure was applied.

Channel Wavelenghts λ [nm] # of photodiodes δλ [nm] tI [s] SNR1A 238–283 (307) 400 (625) 0.2 12 10–4341B (307) 283–314 295 (70) 0.2 1.5 434–8412 311–404 841 0.2 1.5 849–51003 394–611 1024 0.4 1.5 3500–42004 578–790 1024 0.4 1.5 3239–4214

Table 1: GOME spectral channels and characteristics. δλ is the spectral resolution, tI the integration time,SNR the signal-to-noise ratio, from (Eichmann, 2001). Numbers in brackets were valid before a channelshift on 7th June 1998.

As mentioned in Section 2.1., apart from thespectral measurements a number of geophysicalparameters are provided to the network. These in-clude solar and satellite zenith angles and the scanangle, as well as three separate flags to accountfor the pixel type (east, nadir, west). Fig. 2 in-fers that latitude and season allow the implicit con-struction of some sort of climatology. The contin-uous in-orbit time provides a means to correct fortime-dependent sensor degradation effects to a cer-tain extend. The UKMO temperature profile wasprovided because of its strong correlation to atmo-spheric ozone.

Training algorithm selection for MLPs has be-come a fairly wide field, although many new algo-rithms are still based on the principle of backpropa-gation as described by Rumelhart et al. (1986). Therelative inefficiency of the original algorithm makesit unsuitable for dealing with the vast amount of datapresent in satellite meteorology, therefore severalalternative learning algorithms have been surveyedin the frame of this and previous projects (Kaifel

and Müller, 2001). For the given case, the Re-silient Propagation algorithm (RPROP) developedby Riedmiller and Braun (1993) has been found ad-vantageous. In RPROP, the magnitude of weightchanges does not depend on the derivative of theerror function E, as in backpropagation. Instead,each weight wi j is associated with its own weightstep ∆i j . These weight steps are modified accord-ing to a one-step history of sign changes in the errorfunction derivative. Roughly speaking, the RPROPalgorithm accelerates weight adjustment as longas the network is moving towards a minimum, butstops abruptly when stepping over a minimum, andsubsequently tries to approach it by interval divi-sion. This behaviour was observed to lead to fastand very stable error decreases during training onthe data sets provided here (Müller et al., 2001,2002).

As we mentioned before, a specific problem oftraining with measured ozone profiles instead ofsimulations is the incompleteness of the targetdata. Ozonesondes were used only in a height

6

Input parameter # of neurons Purpose270–325 nm 74 O3 Hartley/Huggins band380–385 nm 13 atmospheric window598–603 nm 6 O3 Chappuis band758–772 nm 9 O2 band: cloud detectionSatellite & solar zenith angles 4 slant column correctionline-of-sight flags 3 slant column correctionLatitude & season 2 ozone climatology backgroundIn-orbit time 1 instrument degradation correct.UKMO T-profile 10 atmospheric state info

Table 2: Neural network input parameters for ozone profile retrievals. The wavelength ranges refer to sun-normalized and logarithmized radiances measured by the GOME instrument.

Figure 2: Comparison of observed test data set er-rors for climatological networks, i. e. ones withoutGOME spectral input data. The reference run usedthe configuration from Tab. 2, but without T-profile,to clarify the impact of spectral information. F&Krefers to using the ozone climatology from (Fortuinand Kelder, 1998) instead of NNORSY retrievals.

range reaching up to 30 km, while data from limb-sounding satellite instruments become unreliablebelow 10–20 km (SPARC, 1998). These differentmeasurements do not normally coincide in spaceand time, and hence cannot just be joined together.Therefore, for each target profile t p = (t p

1, tp2, . . . , t

pn)

T ,data were either flagged as missing (mp

k = 0) or asavailable (mp

k = 1). With op = R(yp,cp,w) being thenetwork output, the pattern output error calculation(cf. Eq. 7) was modified according to

Ep =12

(n

∑k=1

mpk

)−1 n

∑k=1

(t pk −op

k)2 ·mp

k. (8)

This way, only available data contribute to theweight adjustment. Note that this modification doesnot noticably change the well-behavedness of theRPROP algorithm, as has been shown in (Mülleret al., 2002).

3. RETRIEVAL ERROR STATISTICS

3.1. Global Error Statistics

The standard method for evaluating the quality ofneural network output is to assess the error statis-tics on a test collocation data set independent ofthe training data. This is already being done duringtraining, which is considered completed once thetest data set error decreases slower than a giventhreshold rate. In this sense, the test data set isnot completely independent of the training results inthat it determines the best configuration to use andwhen to stop training (Plutovski, 1996). However,it turns out that for the kind of problem faced here,statistics do not chance significantly when using afull-sized third, truly independent evaluation dataset for validation purposes only. Since good col-locations are a valuable asset, the validation datawas composed solely of collocations with ozonson-des from Hohenpeissenberg (47.8◦N, 11.0◦E) andSyowa (69.0◦S, 39.6◦E). However, care has beentaken to prevent any ozone profiles from appearingin both training and test data, which can happenwith multiple collocations and might disrupt statis-tical independence. The test data set constructedconsists of 12281 collocations with a distributionsimilar to the training data (70048 colloc.).

7

Figure 3 shows the global error statistics for bothtraining and test data. With dp = op− t p, and d beingthe average of d of the entire data set, note thatrelative standard deviation is defined as

s=∑P

p

√1

P−1(dpk−k)2

1P ∑P

p t pk

, (9)

not as√

1T−1 ∑(dp− d)2/tp. Since we do not use av-

eraging kernels or other smoothing methods, sondeprofiles used in the collocations were only inte-grated to 1 km layers. Retrieval vertical resolutionis probably considerably lower (cf. Sec. 4.2.). Thiswould give rise to extreme relative errors due to un-resolved fine structure whenever the latter calcula-tion method is applied.

σ σ

σ

µ

Figure 3: Global relative statistics of NNORSYozone profile retrievals compared to training andtest data set collocations. σ denotes standard de-viation, whereby σC is calculated by using profilesfrom the Fortuin and Kelder (1998) climatology in-stead of retrievals.

As can be seen in Fig. 3, there is almost nonex-istent bias between the collocations and NNORSYoutput. This is a property of the quadratic neu-ral network error function (Bishop, 1995a), andshows the importance of constructing a represen-tative training data set. Overall, NNORSY-GOMEreduces the standard deviation of the monthly meanFortuin and Kelder climatology (Fortuin and Kelder,1998, referred to as F&K in the following) with re-spect to the collocated sonde and limb-sounderdata by around 20–45%, depending on height.

Test data set errors are only marginally higherthan the ones from the training data, which is asign of the good generalization ability of the net-work. When further training is carried out, we ob-serve slight improvements above the ozone peak,but the network starts overfitting the sonde data,which leads to higher test data errors in the tropo-sphere. We are still investigating the optimal stop-ping point for training, and whether the results canbe further improved by trimming the number of val-ues per height level and/or by combining the outputof several different or differently trained networks.

Some other features which can be recognized inthe figure are the sharp peak in the relative erroraround 9 km GPH. Since the retrievals still reducestandard deviation wrt. the climatology, it can beconcluded that this peak is mostly due to the hightemporal and spatial variability of ozone in this layer,which also cannot be captured in monthly means.The errors in the lower troposphere are also sur-prisingly low, considering the fact that gaining ozoneinformation from GOME data in this height rangeis severely limited by clouds and the low S/N ra-tio (Hoogen et al., 1999). However, the collocatedtropospheric ozone profiles are only representativefor the station locations, which means there is notmuch information on the errors e. g over the oceanictroposphere contained in the training and test data.There are hints that the system systematically un-derestimates tropospheric ozone on a global scale,but this is still under investigation.

3.2. Sensitivity Analyses

Sensitivity studies can performed by dividing thetest data set into subsets according to input pa-rameter values. Of these, the latitudinal depen-dency of the errors will be discussed below. Nosignificant dependency on pixel line-of-sight (LOS)was detected, therefore we conclude that the net-work is well capable of correcting for scan angle ef-fects. With time, the errors do not follow a significanttrend, although in the stratosphere errors for 2001tend to be slightly higher than in the previous years.This can be explained in part by the underrepresen-tation of training data from 2001, and will be furtherlooked at in a case study (Sec. 3.3.).

Figure 4 displays the relative statistics for the testdata set divided into latitude strips. The variationsobserved result from a combination of the latitudinalchanges of ozone natural variability with the latitudi-nally uneven distribution of training data (especiallyozonesonde data). For instance, in the southern-most strip 1245 of the 1704 collocations stem from

8

60oN-90oN (2249)

-20 0 20 40 60 80 100relative val. [%]

0

10

20

30

40

50

60

GP

H /

[km

]30oN-60oN (3487)

-20 0 20 40 60 80 100relative val. [%]

0

10

20

30

40

50

60

GP

H /

[km

]

0oN-30oN (1384)

-20 0 20 40 60 80 100relative val. [%]

0

10

20

30

40

50

60

GP

H /

[km

]

30oS-0oN (1548)

-20 0 20 40 60 80 100relative val. [%]

0

10

20

30

40

50

60

GP

H /

[km

]

60oS-30oS (1914)

-20 0 20 40 60 80 100relative val. [%]

0

10

20

30

40

50

60

GP

H /

[km

]

90oS-60oS (1704)

-20 0 20 40 60 80 100relative val. [%]

0

10

20

30

40

50

60

GP

H /

[km

]

µ

σσ σ

Figure 4: Same as Fig. 3, but only test data set, divided into different latitude strips.

the limb sounders, which obviously contribute con-siderable knowledge in the stratosphere, but sondedata are rare, therefore the standard deviation in thetroposphere is fairly high. It is clear that the trainingalgorithm yields lower errors in regions were dataare abundant, because it receives more error feed-back from these areas. The generally large solarzenith angles (SZAs) in the Antarctic region alsocontribute to the retrieval problems.

In the north polar region there is ampleozonesonde data (49%), but no satellite data northof 71◦N. Thus, the network has enough datato learn a suitable SZA correction for the lowerheights, but the relatively low errors above 20 kmare somewhat misleading, since they can only becomputed from satellite profiles southward of 71◦N.Therefore errors closer to the pole might be some-what higher, but cannot be estimated from the testdata.

In the tropics, the climatology obviously repro-duces the ozone concentration above 20 km ratherwell, with NNORSY yielding a less significant re-duction in standard deviation. This is a result of thelow ozone variability in this part of the atmosphere,where ozone distribution is mostly governed by pho-tolytic ozone production, not by short-term transportand chemistry processes. We also note a low biasof up to 20% in the region around 12 km, the rea-son of which is not completely understood. A com-bination of different effects are suspected to con-tribute, like frequent cloud occurrence, fewer num-ber of collocations, and model errors resulting fromthe distincively non-Gaussian distribution of targetvalues at these heights. However, due to the lowozone molecule number density in the tropical tro-posphere, in absolute terms this bias amounts toonly ∼ 0.1· 1018 m−3, and is thus easily offset glob-ally, with the average extratropical ozone number

9

density at 12 km ranging around 2· 1018 m−3.

3.3. Ozonesonde Case Study

Fig. 5 depicts the timeline of both NNORSY and Ho-henpeissenberg ozone sondes. For this compari-son, layers of 10 km GPH thickness have been cal-culated to reduce errors resulting from the differentvertical resolution of the ozone profiles.

The timeline plots reveal a good agreement ofNNORSY with the sondes in all height ranges, butespecially between 10 and 20 km. Note that forthese plots, all GOME collocations within 250 kmof a given ozonesonde were averaged into a sin-gle profile. As expected, largest deviations arise inSpring, where there temporal and spatial variabil-ity of ozone in the northern midlatitudes is high.There is however no significant bias or drift of theNNORSY partial columns with respect to the son-des, even after beginning of the year 2001, whenthe ERS-2 satellite was switched to gyroless modeand the operational GOME Level 1 data started de-veloping a number of problems (Aben et al., 2000).It seems the neural network is sophisticated enoughto implement a correction for radiance data degra-dation effects based on the input data. However, itmust be noted that on a test dataset average, theerrors increase slightly after January 2000, so thecorrection is by no means perfect. The F&K clima-tology, also shown in the figure, cannot reproducethe short term changes in the ozone columns, andseems to be somewhat biased to higher values inthe 20 to 30 km height range, which might hint atnoticable ozone losses in this height range (Reidet al., 2000), since the climatology is compiled fromdata in the time frame 1980–1991.

4. PROFILE CHARACTERIZATION

In Section 2.1., it was shown that the neural net-work method is based on a quadratic error function,giving rise to a single standard deviation parame-ter, which is unsatisfactory in most cases. Instead,the test data set statistics presented in the previ-ous sections may in principle be used as a crudeaccuracy estimate for the ozone profiles. They en-compass GOME spectral measurement noise, col-location errors, smoothing errors, and biases be-tween the different collocated profiles. The uncer-tainties of collocated target profiles exhibit their in-fluence in two different ways: In the training data,they serve as a noise source and help regularizethe retrieval system during the training phase, while

in the test data they add directly to the observederror statistics. Not included in these statistics areerrors due to the spatial and temporal distribution oftraining data, i. e. errors connected to the represen-tativeness of the training collocations. These comeinto play especially in places where collocations aresparse, at the north pole for instance, and over thesouthern hemisphere oceanic troposphere.

4.1. Local Error Estimation

On the other hand, a precision estimate for individ-ual retrieved profiles would be desirable for a num-ber of applications. In a classical system, these es-timates can be achieved by calculating the Jacobianof the inversion,

Dy =∂x∂y

, (10)

and propagating the input data errors accordingto

SD = DyTSyDy. (11)

Here, Sy is the measurement error covariancematrix, and SD the retrieval error covariance matrix.In our case, the diagonal elements of SD yield vari-ance estimates for each profile height level. Usu-ally, the input noise is assumed to be uncorrelated,hence Sy is diagonal and the profile errors effec-tively result from a quadratic propagation of the in-put errors using Gauss’ error propagation rule.

Although the NNORSY Jacobian can be easilycalculated (Bishop, 1995a; Aires et al., 2001), theabove procedure seems to grossly overestimate theobserved errors, leading to values of several thou-sand percent in some instances. Looking at the Ja-cobian column vectors – the so-called contributionfunctions (Rodgers, 2000) – which each describethe reaction of the output profile to a change of oneinput parameter, reveals very similar patterns be-tween neighboring spectral values, sometimes withopposing sign. It would thus seem that the net-work determines the output largely from groups ofinput parameters, thereby cancelling their errors ina nonlinear, sophisticated way which cannot be re-produced by the linear approximations in Eq. 11.These findings are generally in line with what Aireset al. (2001) report on the nonlinear properties ofretrieval networks.

Alternative local error estimation methods forneural networks have been extensively investigatedin current literature (e. g. Papadopoulos et al., 2001,

10

Figure 5: Timeline of partial ozone columns from NNORSY (�) and collocated Hohenpeissenberg (47.8◦N,11.0◦E) ozone sondes (+). Multiple collocations within 160 km were averaged into single points. The F&Kclimatology is plotted for comparison as a dashed curve.

and references therein), but most applications dealwith fairly small data sets, for which an enhancednetwork can be constructed to approximate pa-rameters of the output probability density (Bishop,1995a). These methods commonly involve massiveincreases in the number of output neurons, which inour case would unbalance network size and trainingdata set size.

Thus, two other alternatives have been investi-gated. One of them is a brute force approach, inwhich the retrieval is repeated M times while theinput data is disturbed using realistic assumptionsabout measurement noise. The standard deviationσ can then be calculated as

σk =

√1

M−1

M

∑m=1

(dm,k− dk)2, where (12)

dm = om−ot (13)

is the difference between disturbed profile om andthe undisturbed profile ot . Of course, this slowsdown the retrieval calculation by the factor M, butthis is still acceptable – one retrieval with errors can

be calculated in less than one second on a 400 MHzUltraSPARC2 processor. This method results insome kind of precision estimate. In the experimentsconducted thus far, error covariances in the inputdata are not taken into account, and the assump-tions about spectral measurement noise are greatlysimplified, adding to the uncertainty of the method.As a worst case scenario, we used 10% noise forGOME Channel 1a, 5% for Ch. 1b, 3% for Ch.2 and 2% for Ch. 3 and 4 [R. van Oss, personalcommunication], 1◦ uncertainty for the angular inputdata and 3 K for the UKMO temperature profiles.

The second method investigated has been in-spired by Satchwell (1994, verbal presentation,cited in Bishop, 1995a). We trained a second neu-ral network with exactly the same structure and in-put data as the retrieval network, but replaced thetraining collocations with the absolute difference be-tween the retrievals and the collocations. The sec-ond network learns the dependence of error mag-nitude on the input data, giving an error estimatesimilar to the one provided in the previous two Sec-

11

tions, but this time for individual profiles. Due to theneural network property of reproducing the condi-tional average of the output, these error estimateswhen applied to the entire training data set sum upto give essentially the same picture as in Fig. 3.However, Dybowski (1997) notes that this methodmay result in a slight overestimation of high errorsand underestimation of low errors, due to the opti-mization behaviour of the training algorithm. Thisremains to be checked for our data; visual compar-ison of the trained errors with the noise errors didnot show any obvious anomalies, though.

Figure 6: Different NNORSY relative error esti-mates for a sample ozonesonde collocation fromstation Legionov, Poland (52.7◦N; 19.3◦O), 10thFeb. 1999. The noise error is derived from disturb-ing the input data, the trained error results from asecond neural network. Classical error propagationusing the neural Jacobian leads to questionable re-sults. See text for further details on the methodsemployed.

For the midlatitude high ozone situation depictedin Fig. 6, the brute force approach results in a rel-ative standard deviation of around 5-10%, ratherconstant over height with the exception of a peakof 20% at 9 km. These statistics are based onM =5000 noisy retrievals. The error-trained net-work yields a very similar error profile shape, butreaches higher values, as was expected since it en-compasses more error sources. Similar behaviour

of these two methods has been observed in differ-ent situations (Müller, 2002), with the highest rela-tive errors occuring in situations and height rangeswhere low ozone concentrations coincide with highozone variability and/or low training data density,i. e. inside the Antarctic ozone hole, around the mid-latitude tropopause and in the tropical upper tropo-sphere. As noted above, the Jacobian-calculatederrors are generally larger by one or two ordersof magnitude, and do not seem to represent thedeviation of the retrievals from collocated profilescorrectly. The best way to assess the quality ofthe error estimates gained thus far would probablybe to investigate their impact when assimilating theNNORSY ozone profiles into a chemical transportmodel with independent validation.

Another interesting aspect of the noise errormethod is that it allows the output error correlationto be assessed simply by statistics on the noisyretrievals. In Fig. 7, we compare this correlationto the correlation of the Jacobian column vectors.Although not directly comparable in a mathemati-cal sense, both should yield a measure of how theoutput profile levels are linked to each other. Theresulting correlation matrices turn out to be fairlysimilar. Both show a largely symmetric and well-defined decrease of correlation around the diago-nal, and distinct off-diagonal anticorrelations of thelevels below 20–22 km to greater heights, whichprobably correspond to vertical ozone peak shifts:For instance, if the NNORSY retrieval wrongly shiftsthe ozone profile to a lower height, the steep flanksaround the ozone peak will cause values below thepeak to have a high tendency, while at the sametime values above the peak will tend to be too low,hence the observed anticorrelation. It also followsthat the similarities observed in the neural Jacobiancolumn profiles do carry some information on thesystem’s behaviour, which expresses in their heightcorrelation, but cannot be exploited for error magni-tude estimates, as shown above.

4.2. Vertical Resolution

The estimation of vertical resolution via averagingkernels (AKs) (Rodgers, 1990), as commonly usedfor classical retrieval schemes like OE, does notseem to be a ideal solution for the NNORSY sys-tem. The AK matrix A is defined here as

A = DyK , (14)

where Dy is the neural network Jacobian fromEq. 10, and K has to be calculated according to

12

Figure 7: Combined error correlation matrix, aver-aged over the test data set. The upper left half ofthis matrix is derived from Jacobian column corre-lation, the lower right from noise error correlation.

Eq. 3 using a separate forward model. Each rowof A then contains an AK for the respective heightlevel.

While Jiménez (2000) found neural AKs for asimulated microwave limb-sounding retrieval to becomparable to OE AKs, if somewhat noisier, he alsopoints out that the problem considered was fairly lin-ear. For neural forward models, localized height in-formation has been observed to spread out throughthe network during the different normalization pro-cedures and nonlinear transfer functions it is sub-mitted to (Aires et al., 1999; Chevallier and Mah-fouf, 2000). This leads to rather ill-conditioned Ja-cobians, as has been seen for NNORSY as well inthe previous section. Using Jacobian-derived AKsfor resolution estimation might also be inappropri-ate in our case for another reason, which is thewealth of statistical information present in the re-trievals; in fact, some high-resolution features seenin the NNORSY profiles cannot possibly stem fromspectral information alone. Therefore viewing thesystem through the classical AK lens might give atwisted impression of the observed geophysical ver-tical resolution altogether.

In view of these issues, we opted for a largelyheuristical approach to vertical resolution estima-tion. From visual impression, it is obvious thatthe NNORSY profiles are oversampled, i. e. effec-tive resolution is worse than the 1 km height sam-pling used for the output neurons. Ozonesondes

and the occultation instruments used for colloca-tion thus have a better resolution than NNORSY inmost cases. A considerable part of the differencesbetween retrievals and collocations in Fig. 3 cantherefore be attributed to smoothing (cf. Rodgers,2000). Convolution of the collocated profiles with asmoothing function should be able to reduce this er-ror contribution. However, in the limiting case of aninfinitely bad resolution, the smoothed profile wouldbe constant over height, which obviously gave riseto large smoothing errors again, this time from thecollocated profiles being too badly resolved. Some-where between the extremes of full resolution andcompletely flattened profiles, there should be a min-imum error, the corresponding FWHM of whichcould be viewed as the effective resolution of theretrieval at this height level.

We use Gaussians of variable full width half max-imum (FWHM) δ as smoothing functions, whichis equivalent to constructing AK matrices Aδ fromGaussian-shaped AKs. Then, for δ varied in stepsof 0.1 km, we calculate the RMS error of retrievalsR(·) against smoothed test data set collocations,

RMSEδ =

√√√√ 1T

T

∑p=1

(R(yp,cp,w)−Aδxp)2. (15)

In contrast to the standard deviation, the RMSEwas chosen to take into account biases as well,which may occur whenever steep gradients in theprofile are smoothed out. To reduce fringe effectsfrom the convolution with incomplete profiles, whichwere observed to produce strong fluctuations, val-ues at height levels with sonde and limb data miss-ing were filled with corresponding retrieved values.Obviously, as soon as too many missing valueshave to be replaced by retrievals in a given height,the results will get distorted. It is clear that thesmoothing error of a retrieval with respect to asmoothed version of itself can only rise, thereforethe minimum error FWHM tends towards zero whenthe number of missing values increases. Note thatthis argument would not hold if we had used a cli-matology to replace missing values.

Plotting the RMSE against δ and height resultsin the contour seen in Fig. 8a. This plot was calcu-lated from test collocations, but for training data (notshown) there is very little difference, which hints ata good generalization performance of the system —considerably larger test data set errors would havemeant the network only “memorized” the trainingdata.

13

δ δ

Figure 8: (a) Test data set absolute RMSE variation (in arbitrary units) when convoluting target profileswith a Gaussian of given FWHM. (b) FWHM at which minimum RMSE occurs, calculated for different data(sub-)sets.

A distinct saddle can be observed in the contour,the corresponding FWHM ranging between 3–5 kmat 11 to 32 km gph, and between 1–3 km belowand above, with the exception of the lower tropo-sphere, where the minimum error occurs at 3–8 kmFWHM. For once, it is clear that the method onlyworks where the ozone profile exhibits some struc-ture. Therefore interpretation of the minimum curverequires some care. To assist in the interpretation,Fig. 8b differentiates the results by collocation type.As can be seen, there is good agreement betweenthe occultation and sonde curves between 8 and22 km gph, with the curves diverging above andbelow. Due to the abovementioned treatment ofmissing data, we may safely assume the maximumFWHM from both curves to closer resemble the truevertical resolution.

Above 30 km, the collocated profiles are gener-ally very smooth in the first place, therefore we can-not draw certain conclusions from the results here.On the other hand, vertical resolution is not as crit-ical when there is no fine structure to detect. Theother extreme can be observed below 12 km gph.Here, the convolution with a Gaussian of even rel-atively small FWHM leads to a shifting of ozonefrom the high ozone lower stratosphere into thetroposphere, where it degrades retrieval accuracy.Thus, the minimum FWHM is unrealistically small.In fact, OE-derived AKs at these height levels tendto be quite asymmetric (Hoogen et al., 1999), such

that the approximation with a Gaussian may not re-flect the physical conditions correctly. Using onlysonde data from the tropics (not shown), where thetropopause is located at 15 km altitude and above,the RMSE minimum below 12 km altitude occurs at5–8 km FWHM.

Figure 9: Comparison of the different vertical reso-lution estimates employed.

It might be worthwhile to compare the empiricalresolution estimates obtained here with the height

14

level correlation lengths which can be derived fromFig. 7. Since the gradient between the contours forcorrelations of c =0.6 and 0.8 is particularly steepand the contours are quite well-defined, we choosethe width of the diagonal ridge for c≥0.7 as a mea-sure of correlation length. Calculating this lengthseparately for Jacobian and noise error correlation,and plotting it together with the maximum of thecurves in Fig. 8b results in the graphs shown inFig. 9. As can be observed, these resolution es-timates agree reasonably well below 32 km, but di-verge strongly above that level. The 10–11 km verti-cal resolution assessed through correlation lengthsis probably quite realistic for larger heights, becauseit corresponds with the very smooth shape of theoccultation profiles. Note however that this is a limi-tation of the training data, not necessarily of the re-trieval system: From radiative transfer theory, thereshould be enough information in the GOME spec-tra to achieve better vertical resolution in this heightrange (Burrows et al., 1999; Hoogen et al., 1999).Inclusion of high-resolution lidar data into the train-ing dataset might prove helpful to exploit this infor-mation in future versions of NNORSY. In the tro-posphere, looking at individual correlation matrices(not shown) reveals a lot of off-diagonal correlationsand usually a fairly irregular diagonal ridge, there-fore correlation lengths in this height range may notbe dependable.

Figure 10: NNORSY ozone profile retrieval with acollocated ozone sonde measurement, smoothedby Gaussians of varying FWHM.

It may be surprising to see that these vertical res-olution estimates are well below the theoretical limitderived from radiative transfer theory, which lies

around 6–8 km (Hoogen et al., 1999). We attributethis contradiction to the direct use of statistical andtemperature information, and the absence of linearapproximations in the system. To emphasize ourpoint, Fig. 10 shows a NNORSY retrieval of a fairlyextreme ozone hole case, where it can be seenthat the collocated ozone sonde does indeed cor-respond best to the retrieval when smoothed withan FWHM of 2–4 km.

To summarize the findings presented, it seemsthat NNORSY ozone profiles have an effective ver-tical resolution (FWHM) of around 4 km in the 12–31 km GPH range, where the results are most re-liable. Above 31 km, resolution degrades to about10–11 km at 50 km GPH. This is a consequence ofthe smooth training profiles used, and does not nec-essarily reflect full exploitation of GOME spectral in-formation. In the troposphere, resolution is difficultto assess with both methods employed, but is prob-ably slightly less than in the stratosphere and dis-torted by long range correlations to greater heights.

5. CONCLUSIONS AND OUTLOOK

It has been demonstrated that the regularizationimposed by using a neural network approach issufficient to solve the inverse retrieval problem forGOME ozone profiles with stability and results com-parable to classical methods. NNORSY is takingpart in the international effort of comparing differentretrieval systems in the frame of the ESA/ESRINGOME Ozone Profiling Working Group, where weexpect to gain more insight in the chances andproblems of our method.

However, it is already clear that one big advan-tage of NNORSY is speed, making it well suited forreal-time applications and global data reprocessing,even at full GOME horizontal resolution. An experi-mental near real-time system is already in operationat the German Remote Sensing Datacenter (DFD)of the DLR, the data can be accessed via the inter-net†. Error characterization still needs to be some-what improved on a global and case-by-case basisfor the operational product.

While the method is implicitly correcting for cloudalbedos and degradation in GOME channels, it de-pends crucially on high quality training data. There-fore it cannot be used for species which are onlysparsely measured. However, in case a forwardmodel can accurately simulate measurements for

†http://auc.dfd.dlr.de/GOME_NRT/profile.html

15

http://auc.dfd.dlr.de/GOME_NRT/profile.html

a given sensor, using a partially or entirely simu-lated training data set is a viable option. In fact, thisapproach has recently led to the development of anew type of ground-based UV spectral radiometer‡

based entirely on MPLs trained with simulated UVground spectra (Schwander et al., 2001). As a glob-ally optimizing retrieval system, neural networks re-act to input data in a completely different way thanlocal approaches like Optimal Estimation. If a bias-free training data set can be constructed, their re-trievals results will also be globally bias-free, whichmight prove useful in applications like data assimi-lation.

For NNORSY-GOME ozone profiles, further im-provements in the fields of spectral calibration andstate space sampling distribution could lead to a yetbetter data quality in the future, especially concern-ing extreme cases of ozone profiles. This mightinclude exploiting additional profile data sources– lidars, advanced sounders – and improving theoutlier detection algorithms for both ozone trainingdata and GOME spectra. Biases and errors oc-curing around the problematic tropopause regioncould probably be considerably reduced by defin-ing a variable output height grid relative to thetropopause height, for this would introduce greaterconsistency e. g. between tropical and extratropicalozone profiles. There is also improvement potentialin the refinement of the network architecture andtraining method. For instance, using an error func-tion specifically adapted to the retrieval problem ortraining different (specialist) networks and combin-ing their results might yield more accurate resultsand error estimations.

An adaptation of NNORSY towards state ofthe art sensors (SCIAMACHY, OMI, GOME-2) isplanned, and can be carried out with consider-ably reduced effort, because instrument calibrationspecifics are learned automatically by the network,and much of the work in setting up software andtraining data has already been done. Since boththe number of available training data and the com-putational effort for classical retrieval and/or assim-ilation of data from upcoming satellite instrumentsincrease steadily, we expect to see growing use ofneural network type methods in satellite meteorol-ogy, especially in the operational regime.

References

Aben, I., M. Eisinger, E. Hegels, R. Snel, andC. Tanzi, 2000: GOME Data Quality Improve-‡http://www.sprafimo.de

ment (GDAQI), Final Report, Tech. Rep. TN-GDAQI-003SR/2000, Space Research Organiza-tion Netherlands (SRON), Utrecht, Netherlands.

Aires, F., C. Prigent, W. B. Rossow, and M. Roth-stein, 2001: A new neural network approach in-cluding first guess for retrieval of atmospheric wa-ter vapor, cloud liquid water path, surface tem-perature, and emissivities over land from satel-lite microwave observations, J. Geophys. Res.,106(D14), 14887–14908.

Aires, F., M. Schmitt, A. Chedin, and N. Scott, 1999:The ”weight smoothing” regularization of MLP forJacobian stabilization, IEEE-NN, 10(6), 1502.

Bargen, A. V., W. Thomas, and D. Loyola, 1999:GOME Data Processor Update Report for GDP0-to-1 Version 2.0 and GDP 1-to-2 Version 2.7,Tech. Rep. ER-TN-DLR-GO-0043, DeutschesZentrum für Luft- und Raumfahrt, DeutschesFernerkundungs-Datenzentrum, Oberpfaffen-hofen, Germany.

Baron, A., 1993: Universal approximation boundsfor superposition of sigmoidal functions, IEEETrans. Information Theory, 39, 930–945.

Baron, A., 1994: Approximation and estimationbound for artificial neural networks, MachineLearning, 14, 115–133.

Bhatt, P. P., E. E. Remsberg, L. L. Gordley, J. M.McInerney, V. G. Brackett, and J. M. Russell III,1999: An evaluation of the quality of Halogen Oc-cultation Experiment ozone profiles in the lowerstratosphere, J. Geophys. Res., 104(D8), 9261–9275.

Bishop, C. M., 1995a: Neural Networks for PatternRecognition, Clarendon Press, Oxford.

Bishop, C. M., 1995b: Training with noise is equiva-lent to Tikhonov regularization, Neural Computa-tion, 7(1), 108–116.

Bovesmann, H., J. P. Burrows, M. Buchwitz, J. Fr-erick, S. Noel, V. Rozanov, K. V. Chance, andA. H. P. Goede, 1999: SCIAMACHY - Mission ob-jectives and measurement modes, J. Atmos. Sci-ences, 56, 125–150.

Burrows, J., E. Hölzle, A. Goede, H. Visser, andW. Fricke, 1995: SCIAMACHY - Scanning Imag-ing Absorption Spectrometer for AtmosphericChartography, Acta Astronautica, 35, 445–451.

16

http://www.sprafimo.de

Burrows, J., M. Weber, M. Buchwitz, V. Rozanov,A. Ladstädter-Weissenmayer, A. Richter,R. de Beek, R. Hoogen, K. Bramstedt, K.-U.Eichmann, M. Eisinger, and D. Perner, 1999: TheGlobal Ozone Monitoring Experiment (GOME):Mission concept and first scientific results, J.Atmos. Sciences, 56, 151–175.

Chance, K. V., J. P. Burrows, D. Perner, andW. Schneider, 1997: Satellite measurementsof atmospheric ozone profiles, including tropo-spheric ozone, from UV/visible measurements inthe nadir geometry: A potential method to re-trieve tropospheric ozone, J. Quant. Spectrosc.Rad. Transfer, 57(4), 467–476.

Chevallier, F., F. Chéruy, N. A. Scott, and A. Chédin,1998: A neural network approach for a fast andaccurate computation of a longwave radiativebudget, J. Appl. Meteor., 37, 1385–1397.

Chevallier, F. and J.-F. Mahfouf, 2000: Evaluationof the Jacobians of infrared radiation models forvariational data assimilation, J. Appl. Meteor.,submitted May 2000.

Cunnold, D. M., W. P. Chu, R. A. Barnes, M. P. Mc-Cormick, and R. E. Veiga, 1989: Validation ofSAGE II ozone measurements, J. Geophys. Res.,94(D6), 8447–8460.

Cunnold, D. M., H. J. Wang, L. W. Thomason, J. M.Zawodny, J. A. Logan, and I. A. Megretskaia,2000: SAGE (version 5.96) ozone trends in thelower stratosphere, J. Geophys. Res., 105(D4),4445–4457.

de Beek, R., 1998: Bestimmung von Ozonver-tikalprofilen aus Messungen des Satelliteninstru-mentes GOME im ultravioletten und sichtbarenSpektralbereich, Ph.D. thesis, University of Bre-men, Institute of Environmental Physics (IUP),Berichte aus der Umweltphysik D46 (in German).

Deniel, C., F. Dalaudier, E. Chassefiere, R. M.Bevilacqua, E. P. Shettle, J. D. Lumpe, K. W. Hop-pel, J. S. Hornstein, D. W. Rusch, and C. E. Ran-dall, 1997: A comparative study of POAM II andECC sonde ozone measurements obtained overnorthern europe, J. Geophys. Res., 102(D19),23629–23642.

Dybowski, R., 1997: Assigning Confidence Inter-vals to Neural Network Predictions, Tech. rep.,Division of Infection, UMDS (St Thomas’ Hospi-tal), London.

Eichmann, K.-U., 2001: Die Ozonverteilungender Nordhemisphäre 1997–2000 Gemessen mitGOME: Einfluss Von Dynamik und Chemie in derStratosphäre, Ph.D. thesis, University of Bremen,Institute of Environmental Physics (IUP), (in Ger-man).

Fortuin, J. P. F. and H. Kelder, 1998: An ozoneclimatology based on ozonesonde and satel-lite measurements, J. Geophys. Res., 103(D24),31079–31734.

Hasekamp, O., J. Landgraf, F. Helmich, R. vanOss, and R. van der A, 1999: Two inversionmethods for ozone profile retrieval from GOME,in: ESAMS’99 - European Symposium on Atmo-spheric Measurements from Space, Noordwijk,Netherlands, 18–22 January 1999, pp. 549–553,ESA, WPP-161, Noordwijk.

Hoogen, R., V. V. Rozanov, and J. P. Burrows, 1999:Ozone profiles from GOME satellite data: Algo-rithm description and first validation, J. Geophys.Res., 104(D7), 8263–8280.

Hornik, K., M. Sinchcombe, and H. White, 1989:Multilayer feedforward networks are universal ap-proximators, Neural Networks, 2, 359–366.

Jiménez, C., 2000: Inversion of Microwave LimbSounding Observations of the Atmosphere bya Neural Network Technique, Tech. Rep. 364L,Dept. Radio and Space Science, Chalmers Uni-versity of Technology, Göteborg, Sweden.

Kaifel, A. K. and M. D. Müller, 2001: Results ofTOVS ozone retrieval with neural networks, in:J. Le Marshall and J. D. Jasper (Eds.), Techn.Proc. 11th Int. ATOVS Study Conference, Bu-dapest, Hungary, 20–26 September 2000, pp.153–165, International ATOVS Working Group,Bureau of Meteorlogy Research Centre, Mel-bourne, Australia.

Krasnopolsky, V. M., 1997: Neural Networks forStandard and Variational Satellite Retrievals,Tech. rep., Contribution 148, NOAA Environmen-tal Modeling Center, Ocean Modeling Branch,Washington (DC).

Lu, J., V. A. Mohnen, G. K. Yue, R. J. Atkinson,and W. A. Matthews, 1997: Intercomparison ofstratospheric ozone profiles obtained by SAGE II,HALOE, and ozonesondes during 1994-1995, J.Geophys. Res., 102(D13), 16137–16144.

17

Lucke, R. L., D. R. Korwan, R. M. Bevilacqua, J. S.Hornstein, E. P. Shettle, D. T. Chen, M. Daehler,J. D. Lumpe, M. D. Fromm, D. Debrestian, B. Neff,M. Squire, G. König-Langlo, and J. Davies, 1999:The Polar Ozone and Aerosol Measurement(POAM) III instrument and early validation re-sults, J. Geophys. Res., 104, 18.785–18.799.

Lumpe, J. D., R. M. Bevilacqua, K. W. Hoppel, andC. E. Randall, 2002: POAM III retrieval algorithmand error analysis, J. Geophys. Res., in press.

Müller, M. D., 2002: Bestimmung von atmo-sphärischem Gesamtozon und Ozonprofilen ausGOME Daten mit Hilfe des Neural NetworkOzone Retrieval System (NNORSY), Ph.D. the-sis, University of Bremen, Institute of Environ-mental Physics (IUP), (in German).

Müller, M. D. and A. K. Kaifel, 1999: Efficient pro-cessing of multi-year global TOVS data usingITPP, 3I and neural networks, in: J. LeMarshalland J. D. Jasper (Eds.), Techn. Proc. 10th Int.ATOVS Study Conference, Boulder, Colorado, 27Jan – 2 Feb 1999, pp. 397–407, Int. ATOVS StudyWorking Group (ITWG), Bureau of MeteorologyResearch Centre, Melbourne, Australia.

Müller, M. D., A. K. Kaifel, M. Weber, and J. P.Burrows, 2002: Partial training of neural net-works with incomplete target data, applied to at-mospheric science, Neural Networks, submitted.

Müller, M. D., A. K. Kaifel, M. Weber, and S. Tell-mann, 2001: Real-time total ozone and ozoneprofiles retrieved from GOME data using neuralnetworks, in: Proc. 2001 EUMETSAT Meteoro-logical Satellite Data User’s Conference, Antalya,1-5 October 2001, EUMETSAT, Darmstadt, Ger-many.

Papadopoulos, G., P. J. Edwards, and A. F. Murray,2001: Confidence estimation methods for neu-ral networks: A practical comparison, IEEE-NN,12(6), 1278–1287.

Plutovski, M. E. P., 1996: Survey: Cross-Validationin Theory and in Practice, Tech. rep., Depart-ment of Computational Science Research, DavidSarnoff Research Center, Princeton, New Jersey,USA.

Reid, S. J., A. F.Tuck, and G. Kiladis, 2000: On thechanging abundance of ozone minima at north-ern midlatitudes, J. Geophys. Res., 105, 12169–12180.

Riedmiller, M. and H. Braun, 1993: A direct adaptivemethod for faster backpropagation learning: TheRprop algorithm, in: Proc. of the ICNN, pp. 586–591, IEEE, San Francisco.

Rodgers, C. D., 1976: Retrieval of atmospheric tem-perature and composition from remote measure-ments of thermal radiation, Rev. Geophys. SpacePhys., 14(4), 609–624.

Rodgers, C. D., 1990: Characterisation and erroranalysis of profiles retrieved from remote sound-ing measurements, J. Geophys. Res., 95(D5),5587–5595.

Rodgers, C. D., 2000: Inverse Methods for Atmo-spheric Sounding: Theory and Practice, WorldScientific Publishing, Singapore.

Rozanov, V., Y. Timofeyev, M. Biryulina, J. Burrows,R. Spurr, and D. Diebel, 1993: Accuracy of at-mospheric constituent retrieval from multichan-nel remote sensing instruments, in: S. Keevallikand O. Kärner (Eds.), IRS ’92: Current Problemsin Atmospheric Radiation, pp. 394–397, DeepakPublishing, Ganotibm VAm USA.

Rumelhart, D. E., G. Hinton, and R. Williams, 1986:Learning internal representations by error back-propagation, in: D. Rumelhart and J. McClel-lant (Eds.), Parallel Distributed Processing, Vol.1, vol. 1, pp. 318–362, MIT Press.

Russell, L. J., L. L. Gordley, J. H. Park, S. R.Drayson, W. D. Hesketh, R. J. Cicerone, A. F.Tuck, J. E. Frederick, J. E. Harries, and J. P.Crutzen, 1993: The halogen occultation experi-ment, J. Geophys. Res., 98, 10777–10797.

Schwander, H., A. Kaifel, A. Ruggaber, andP. Koepke, 2001: Spectral radiative-transfer mod-eling with minimized computation time by use ofa neural-network technique, Appl. Opt., 40(3),331–335.

Singer, S. F. and R. C. Wentworth, 1957: A methodfor the determination of the vertical ozone distri-bution from a satellite, J. Geophys. Res., 62, 299–308.

Slijkhuis, S. and D. Loyola, 1999: GOME DataProcessor Extraction Software User’s Manual,Tech. Rep. ER-SUM-DLR-GO-0045, DeutschesZentrum für Luft- und Raumfahrt, DeutschesFernerkundungs-Datenzentrum, Oberpfaffen-hofen, Germany.

18

SPARC, 1998: SPARC/IO3C/GAW Assessment ofTrends in the Vertical Distribution of Ozone,WCRP-SPARC Report WMO TD No. 935, WorldMeteorological Organization.

Steele, H. M. and R. P. Turco, 1997: Separationof aerosol and gas components in the HalogenOccultation Experiment and the StratosphericAerosol and Gas Experiment (SAGE) II extinctionmeasurements: Implications for SAGE II ozoneconcentrations and trends, J. Geophys. Res.,102(D16), 19665–19681.

Tamura, S. and M. Tateishi, 1997: Capabilities ofa four-layered feedforward neural network: Fourlayers versus three, IEEE Transactions on NeuralNetworks, 8(2), 251–255.

Thompson, A. M., J. C. Witte, R. D. McPeters, S. J.Oltmans, F. J. Schmidlin, J. A. Logan, M. Fuji-wara, V. W. J. H. Kirchhoff, F. Posny, G. J. R.Coetzee, B. Hoegger, S. Kawakami, T. Ogawa,B. J. Johnson, H. Voemel, and G. Labow, 2001:The 1998-2000 SHADOZ (Southern HemisphereAdditional Ozonesondes) tropical ozone climatol-ogy: Comparisons with TOMS and ground-basedmeasurements, submitted to J. Geophys. Res.

Wardle, D. I., E. W. Hare, D. V. Barton, and C. T.McElroy, 1998: The World Ozone and UltravioletRadiation Data Centre - Content and submission,in: R. Bojkov and G. Visconti (Eds.), Proceed-ings of the XVIII Quadrennial Ozone Symposium,1996, L’Aquila, Italy.

19

j1.1 ozone profile retrieval from gome data using a …

Documents