big data at ecmwf

21
Slide 1 © ECMWF Big Data at ECMWF Providing access to multi-petabyte datasets Past, present and future Baudouin Raoult Principal Software Strategist ECMWF

Upload: hoangxuyen

Post on 14-Feb-2017

232 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Big Data at ECMWF

Slide 1 © ECMWF

Big Data at ECMWF Providing access to multi-petabyte datasets

Past, present and future

Baudouin Raoult Principal Software Strategist

ECMWF

Page 2: Big Data at ECMWF

Slide 2 © ECMWF

ECMWF An independent intergovernmental organisation established in 1975 with 20 Member States 14 Co-operating States

Page 3: Big Data at ECMWF

Slide 3 © ECMWF

Page 4: Big Data at ECMWF

Slide 4 © ECMWF

Page 5: Big Data at ECMWF

Slide 5 © ECMWF

Major assimilated datasets

Surface stations

Radiosonde balloons

Polar, infrared

Polar, microwave

Geostationary, IR Aircraft

Page 6: Big Data at ECMWF

Slide 6 © ECMWF

The forecast process

Page 7: Big Data at ECMWF

Slide 7 © ECMWF

ERA-20C completed: Climate monitoring of the 20th Century

● Using >5% of ECMWF’s computing power

● Assimilating billions of observations

● Producing 2,400 global forecasts per day

● Generating 1 PB of reanalysis data in 200 days

● Currently serving reanalysis products to 20,000 users

Page 8: Big Data at ECMWF

Slide 8 © ECMWF

Surface fluxes: greenhouse gases, fires, emissions

Global atmospheric composition

http://atmosphere.copernicus.eu Online catalogue, quick-looks and data

Radiation and ozone layer

European Air Quality

Page 9: Big Data at ECMWF

Slide 9 © ECMWF

ECMWF products

● ECMWF currently receives 300 million observation from 130 sources daily.

● ECMWF operational models produce 13 millions fields daily, for a total of around 8 TB.

● 77 million products disseminated ever day, for a total of 6 TB.

Page 10: Big Data at ECMWF

Slide 10 © ECMWF

What is Big Data?

● Wikipedia: “Big Data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization.”

● Gartner: “Big Data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” (The 3 “V” of Big Data).

● I would like to add two “S”: Scalability and Sustainability.

Page 11: Big Data at ECMWF

Slide 11 © ECMWF

V is for Volume (or coping with exponential growth)

Deletion of 1 PB

Page 12: Big Data at ECMWF

Slide 12 © ECMWF

V is for Velocity (or coping with exponential growth, part 2)

● ECMWF’s archive grows exponentially:

– At ECMWF, r is around 0.5, which is a 50% increase per year

– The daily amount of data added to the archive grows exponentially at the same rate!

● In 1995, the size of the archive was increasing at a rate of 14 TB/year.

● In 2014, the size of the archive increases at a rate higher than 60 TB/day (with peaks at 100 TB).

Page 13: Big Data at ECMWF

Slide 13 © ECMWF

V is for Variety (or coping with complexity)

3DVar 4DVar 12 Hour 4DVar DCDA

EPS 15 days

Vareps/Monthy EDA

50 Members EPS

T106L16

T106L19

T213L31T319L31

T319L50

T319L60

T511L60 T799L91

T1279L91

FC Pressure levels

FC Model levels

Chernobyl

SSTs

TOGA FC

Errors in FG

Waves

EPS

Clusters

Waves FG

Probabilities

Ensemble means & stdev

Other centers

Sensitivity

NCEP EPS

OI

Errors in AN and FG

4D-Var

Tubes

Wave EPS

Errors if FG, surface

Wave proba.

SCDA Analysis

PT and PV levels

SCDA Forecast

SCDA Forecast

Wave 4V

SCDA Waves

Multi-Analysis

4D-var increments

EFIs

DCDA

DCDA Wave

SCDA 4D-Var

EPS PT levels

Overlap, CalVal

Wave EFIs

Vareps/Monthy

4d-Var Model errors

Ensemble data assimilation

X-MP/4 Y-MP/8 C90/12

C90/16

VPP700-48

VPP700-112

VPP5000 IBM-P4 IBM-P5 IBM-P5+ IBM-P6

10M

100M

1G

10G

100G

1T

10T

85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10

Weekend EPS

Weekly Monthly

Extra fields, new gaussian grid

00Z EPS

00Z 10 day FC

00Z Run

00Z Run

End sensitivity

Page 14: Big Data at ECMWF

Slide 14 © ECMWF

ECMWF’s Meteorological Archival and Retrieval System

● 28 years in existence

● A managed archive

● MARS is not a file system

– Users are not aware of the location of the data

– Retrievals are expressed in meteorological terms

● An archive, not a database

– Metadata online

– Data offline (automated tape library)

Page 15: Big Data at ECMWF

Slide 15 © ECMWF

ECMWF’s Meteorological Archival and Retrieval System

● 1014 directly addressable objects

– Unique hypercube-based indexing

● Data is kept forever:

– For many studies, a dataset becomes useful once enough data has been accumulated

– Deleting old data in an exponentially growing archive is meaningless

● 200 million objects/65 TB added daily

● 7000 registered users

● 650 active users, 100 TB retrieved per day, in 1.5 million requests

Page 16: Big Data at ECMWF

Slide 16 © ECMWF

Scalability and sustainability

● Distributed service-oriented architecture, to scale out

● Queues and priorities to ensure quality of service and scaling with the demand

● Indirection is the key to scalability:

– Allow services to be modified/redeployed…

– Allow data to be moved to different media/storage…

– …without any impact on users

● It has allowed us to migrate several times during the past decades:

– MVS to AIX, AIX to Linux, single server to clusters

– CFS to TSM to HPSS

– PL/I and Fortran to C/C++

Page 17: Big Data at ECMWF

Slide 17 © ECMWF

Continuous evolution

● Change is driven by new needs and expectations

– Users are not domain experts anymore…

– …users expect information at “Google speed”

● ECMWF is continuously evolving its data delivery methods and services to cater for the new user requirements

● Interoperability is key

– Follow standards and governance to enable interoperability

– OGC, INSPIRE, ISO 19xxx series, NetCDF-CF, WMO Information System, GEOSS

● Provide high-level services on the data

– Data portals for data discovery

– Web services, REST APIs for data retrieval and manipulation

– Close-to-the-data processes

Page 18: Big Data at ECMWF

Slide 18 © ECMWF

Data exploring tools

Page 19: Big Data at ECMWF

Slide 19 © ECMWF

Interoperability

Page 20: Big Data at ECMWF

Slide 20 © ECMWF

The next steps…

● The era of pushing the data to the user is coming to an end

● The volumes involved are too large

● We live in a post-PC era…

● Users want to access data from smart devices…

● …anywhere, anytime, any device…

● … and share their results.

● We need to bring the user processing to the data:

– Cloud Computing is now mature enough to implement operational services

– We need to build a “platform as a service” (PaaS), on which to“software as a service” (SaaS) solutions for environmental data and products

Page 21: Big Data at ECMWF

Slide 21 © ECMWF 21

Thank you