benchmark database based on surrogate climate records victor venema
Post on 19-Dec-2015
221 views
TRANSCRIPT
![Page 1: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/1.jpg)
Benchmark database
based on surrogate climate records
Victor Venema
M e te o ro lo g ic a l
I n stitu te
B o n n
![Page 2: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/2.jpg)
Goals of COST-HOME working group 1
Literature survey
Benchmark dataset– Known inhomogeneities– Test the homogenisation algorithms (HA)
![Page 3: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/3.jpg)
Benchmark dataset
1) Real (inhomogeneous) climate records Most realistic case Investigate if various HA find the same breaks Good meta-data
2) Synthetic data For example, Gaussian white noise Insert know inhomogeneities Test performance
3) Surrogate data Empirical distribution and correlations Insert know inhomogeneities Compare to synthetic data: test of assumptions
![Page 4: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/4.jpg)
Creation benchmark – Outline talk
1) Start with homogeneous data
2) Multiple surrogate and synthetic realisations
3) Mask surrogate records
4) Add global trend
5) Insert inhomogeneities in station time series
6) Published on the web
7) Homogenize by COST participants and third parties
8) Analyse the results and publish
![Page 5: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/5.jpg)
1) Start with homogeneous data Monthly mean temperature and precip (France) Later also daily data Later maybe other variables
Homogeneous No missing data Detrended
20 to 30 years is enough for good statistics Longer surrogates are based on multiple copies
– Larger scale correlations are small– Distribution well defined with 30a data
Generated networks are: 50, 100 and 200 a long
![Page 6: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/6.jpg)
2) Multiple surrogate realisations
Multiple surrogate realisations– Temporal correlations– Station cross-correlations– Empirical distribution function
Annual cycle removed before, added at the end Number of stations between 5 and 20 Cross correlation varies as much as possible
Show plot temporal structure of surrogates Show plot cross correlations
![Page 7: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/7.jpg)
One station – with annual cycle
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 20000
20
40Measurement
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 20000
20
40Surrogate
1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 19100
20
40Measurement
1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 19100
20
40Surrogate
![Page 8: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/8.jpg)
One station – anomalies
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000-10
0
10Measurement
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000-10
0
10Surrogate
1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10
0
10
1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10
0
10
Measurement
Surrogate
![Page 9: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/9.jpg)
Multiple stations – 10 year zoom
1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10
-5
0
5
10
1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10
-5
0
5
10
Measurement
Surrogate
![Page 10: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/10.jpg)
Multiple stations – 10 year zoom
1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10
0
10Measurement - low cross correlation
1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10
0
10Surrogate
1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10
0
10Measurement - high cross correlation
1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10
0
10Surrogate
![Page 11: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/11.jpg)
IAAFT algorithm smoothes jumps
100 200 300 400 500 600 700 800 900 1000
98
99
100
101
102
Surrogate of Bounded Cascade
Time or space
LWP
or
LWC
100 200 300 400 500 600 700 800 900 1000
98
99
100
101
102
Time or space
LWP
or
LWC
Bounded Cascade time series
![Page 12: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/12.jpg)
3) Mask surrogate records
Beginning of records jagged (rough) Linear increase in number of stations Last station after 25% of full time
End of record all stations are measuring
Influence of jagged edge on detection and correction
But trend is also increasing in time (i.e. different)! Is this a problem?
![Page 13: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/13.jpg)
3) Mask surrogate records
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
2
4
6
8
10
12
14
16
18
20
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
2
4
6
8
10
12
14
16
18
20
![Page 14: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/14.jpg)
4) Add global trend NASA GISS GISS Surface Temperature Analysis
(GISTEMP) by J. Hansen Global mean surface temperature Last year of any surrogate network is 1999
1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990-1.5
-1
-0.5
0
0.5
1Trend
![Page 15: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/15.jpg)
5) Insert inhomogeneities in stations
Random breaks (implemented) Frequency of breaks 1/20a, 1/40a Size constants for temperature: 0.25, 0.5, 1.0 °C Size factors for rain: 0.8, 0.9, 1.1, 1.2
Simultaneous breaks Frequency of breaks 1/50a In 10 to 50 % of network
![Page 16: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/16.jpg)
5) Insert inhomogeneities in stations
Outliers Frequency: 1 – 3 % Size: 99 and 99.9 percentiles
Local trends (only temperature) Linear increase or decrease in one station Duration: 30, 60a Maximum size: 0.2 to 1.5 °C Frequency: once in 10 % of the stations
![Page 17: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/17.jpg)
6) Published on the web
Inhomogeneous data will be published on the COST-HOME homepage
Everyone is welcome to download and homogenize the data
![Page 18: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/18.jpg)
7) Homogenize by participants
Return homogenised data Should be in COST-HOME file format (next slide)
Return break detections– BREAK– OUTLI– BEGTR– ENDTR
Multiple breaks at one data possible
![Page 19: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/19.jpg)
7) Homogenize by participants
COST-HOME file format: http://www.meteo.uni-bonn.de/
venema/themes/homogenisation/costhome_fileformat.pdf For benchmark & COST homogenisation software
One data and one quality-flag file per station Filename: variable, resolution, quality, station
ASCII network-file with station names ASCII break-file with dates and station names
![Page 20: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/20.jpg)
COST-HOME file format – monthly data
![Page 21: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/21.jpg)
COST-HOME file format – network file
![Page 22: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/22.jpg)
8) Analyse the results
Detailed analysis will be performed in the working groups– Detection– Correction– Daily data homogenisation
Synthetic and surrogate data– RMS Error– No. breaks detected (function of size)– Application: reduction in the scatter in the trends
Performance difference between synthetic (Gaussian, white noise) and surrogate data
![Page 23: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/23.jpg)
Work in progress
Monthly precipitation Implement some inhomogeneity types Daily data: other inhomogeneities Synthetic data (Gaussian white noise) More input data!
Agree on the details of the benchmark – Next meeting?
Set deadline for the availability benchmark Deadline for the return of the homogeneous data
![Page 24: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/24.jpg)
Questions
Ideas for a better benchmark For example, for other inhomogeneities, constants
Types of inhomogeneities for daily data Automatic processing
– In the order of 100 networks
![Page 25: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/25.jpg)
![Page 26: Benchmark database based on surrogate climate records Victor Venema](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d365503460f94a0e332/html5/thumbnails/26.jpg)
7) Homogenize by participants
COST-HOME file format: http://www.meteo.uni-bonn.de/
venema/themes/homogenisation/costhome_fileformat.pdf For benchmark & COST homogenisation software Regular ASCII matrix (columns) One data and one quality-flag file per station Yearly, daily, subdaily data: columns for time, one
for data Monthly data: year column, 12 columns for data Filename: variable, resolution, quality, station ASCII network-file with station names ASCII break-file with dates and station names