a prototype infrastructure for sentinel earth … › downloadfile › ...geospatial data...

11
A Prototype Infrastructure for Sentinel Earth Observation Data Relative to Portugal Supporting OGC Standards for Raster Data Management Diogo Filipe Pimenta Andrade [email protected] Instituto Superior T´ ecnico, Lisboa, Portugal May 2017 Abstract This paper describes an architecture for the IPSentinel infrastructure for managing Earth observa- tion data together with its implementation. The infrastructure will catalogue, disseminate and process Sentinel Earth observation products for the Portuguese territory. The prototype implementation uses the DHuS software from the European Space Agency together with the RasDaMan array database management system. RasDaMan implements standards from the Open Geospatial Consortium, such as Web Coverage Service and Web Coverage Processing Service, which provide access and processing of the rasters encoding Earth observation data, through the Internet. The reported experiments show that the prototype system meets the functional requirements. This paper also provides measurements of the used computational resources, in terms of storage space and response times. Keywords: Remote Sensing Products, Earth Observation Data, Raster Data, OGC Standards, Geospatial Data Infrastructures, Big Data Management 1. Introduction Thanks to the Copernicus Earth Observation Pro- gramme, a wide range of Earth Observation (EO) data is being captured everyday for several applica- tions, such as the monitoring of landslides, or mar- itime monitoring and control. However, given that the volume of data acquired daily reaches the Ter- abytes, the European Space Agency (ESA) Data Hub Service (DHuS) does not ensure long-term data preservation or fast access to a set of EO data. These are in fact, two of the limitations of the cur- rent ESA infrastructure that Wagner [17] considers that can be resolved by changing how the EO com- munity is organized. Portugal is interested in having its own collabo- rative infrastructure to store all Sentinel data rela- tive to the Portuguese geographic area. In this way, the Portuguese users interested in the data related to Portugal would not only have the data closer to themselves, as a whole new community would be created. Currently, Instituto Portuguˆ es do Mar e da Atmosfera (IPMA) and Direc¸ ao Geral do Ter- rit´ orio (DGT), the institutions responsible for the Portuguese collaborative infrastructure, are already involved in setting up the Sentinel ground segment in Portugal. This paper addresses the creation of a prototype for the Portuguese infrastructure, named IPSen- tinel, to catalogue, disseminate, and process Sen- tinel EO data for the national community. For this prototype, early on it was decided to adopt the DHuS software provided by ESA, as the base of IPSentinel, to catalogue and to dissemi- nate the EO data transferred from the ESA Sci- entific Data Hub, also known as ESA SciHub. The use of the Open Geospatial Consortium (OGC) ser- vices was also explored, related to raster data ac- cess and processing. Particularly, the Web Cover- age Service (WCS), Web Coverage Processing Ser- vice (WCPS), and Web Map Service (WMS) proto- cols, implemented by Petascope on top of the Ras- DaMan array database, were considered. The inte- gration of these services into DHuS allows IPSen- tinel to provide the ability to process EO data. 2. Concepts and Related Work This section presents some fundamental concepts required for understanding the elaborated work, as well as the essential related work for the develop- ment of the IPSentinel prototype infrastructure. 2.1. The Sentinel Programme The Sentinel Programme is the EO programme in- serted in the largest Copernicus Programme that 1

Upload: others

Post on 04-Jul-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Prototype Infrastructure for Sentinel Earth … › downloadFile › ...Geospatial Data Infrastructures, Big Data Management 1. Introduction Thanks to the Copernicus Earth Observation

A Prototype Infrastructure for Sentinel Earth Observation Data

Relative to PortugalSupporting OGC Standards for Raster Data Management

Diogo Filipe Pimenta [email protected]

Instituto Superior Tecnico, Lisboa, Portugal

May 2017

Abstract

This paper describes an architecture for the IPSentinel infrastructure for managing Earth observa-tion data together with its implementation. The infrastructure will catalogue, disseminate and processSentinel Earth observation products for the Portuguese territory. The prototype implementation usesthe DHuS software from the European Space Agency together with the RasDaMan array databasemanagement system. RasDaMan implements standards from the Open Geospatial Consortium, suchas Web Coverage Service and Web Coverage Processing Service, which provide access and processingof the rasters encoding Earth observation data, through the Internet. The reported experiments showthat the prototype system meets the functional requirements. This paper also provides measurementsof the used computational resources, in terms of storage space and response times.

Keywords: Remote Sensing Products, Earth Observation Data, Raster Data, OGC Standards,Geospatial Data Infrastructures, Big Data Management

1. IntroductionThanks to the Copernicus Earth Observation Pro-gramme, a wide range of Earth Observation (EO)data is being captured everyday for several applica-tions, such as the monitoring of landslides, or mar-itime monitoring and control. However, given thatthe volume of data acquired daily reaches the Ter-abytes, the European Space Agency (ESA) DataHub Service (DHuS) does not ensure long-term datapreservation or fast access to a set of EO data.These are in fact, two of the limitations of the cur-rent ESA infrastructure that Wagner [17] considersthat can be resolved by changing how the EO com-munity is organized.

Portugal is interested in having its own collabo-rative infrastructure to store all Sentinel data rela-tive to the Portuguese geographic area. In this way,the Portuguese users interested in the data relatedto Portugal would not only have the data closer tothemselves, as a whole new community would becreated. Currently, Instituto Portugues do Mar eda Atmosfera (IPMA) and Direccao Geral do Ter-ritorio (DGT), the institutions responsible for thePortuguese collaborative infrastructure, are alreadyinvolved in setting up the Sentinel ground segmentin Portugal.

This paper addresses the creation of a prototype

for the Portuguese infrastructure, named IPSen-tinel, to catalogue, disseminate, and process Sen-tinel EO data for the national community.

For this prototype, early on it was decided toadopt the DHuS software provided by ESA, as thebase of IPSentinel, to catalogue and to dissemi-nate the EO data transferred from the ESA Sci-entific Data Hub, also known as ESA SciHub. Theuse of the Open Geospatial Consortium (OGC) ser-vices was also explored, related to raster data ac-cess and processing. Particularly, the Web Cover-age Service (WCS), Web Coverage Processing Ser-vice (WCPS), and Web Map Service (WMS) proto-cols, implemented by Petascope on top of the Ras-DaMan array database, were considered. The inte-gration of these services into DHuS allows IPSen-tinel to provide the ability to process EO data.

2. Concepts and Related WorkThis section presents some fundamental conceptsrequired for understanding the elaborated work, aswell as the essential related work for the develop-ment of the IPSentinel prototype infrastructure.

2.1. The Sentinel Programme

The Sentinel Programme is the EO programme in-serted in the largest Copernicus Programme that

1

Page 2: A Prototype Infrastructure for Sentinel Earth … › downloadFile › ...Geospatial Data Infrastructures, Big Data Management 1. Introduction Thanks to the Copernicus Earth Observation

the European Union is financing. The programmeis coordinated by the ESA, and involves launching afamily of Sentinel satellites that are equipped withsensors to remotely capture distinct data types, cov-ering a broad range of applications [1]. Each Sen-tinel mission has a different purpose, and each ofthese missions is based on a constellation of twosatellites collecting data in parallel, to increase cov-erage and operational data availability.

2.2. Computational of EO Rasters.

Raster data, also known as bitmaps, are imagesthat contain a description of each pixel, as opposedto vector graphics which use points, lines, curvesand polygons to encode the information. Rasterdata can be stored, compressed or uncompressed,in image files with varying formats, such as PNGor TIFF. In the context of EO applications andgeographical information systems in general, therasters are also georeferenced, in the sense that eachpixel is known to be associated to a particular geo-graphical region. Some of the most popular rasterformats for encoding EO data are GeoTIFF, JPEG[7], netCDF [13], HDF41 and HDF52.

2.3. Array Database Managing Services

An array Database Management System (DBMS)stores and manages arrays, also called raster data,time-series of rasters, or multidimensional discretedata (MDD) [2]. In Geographic Information Sys-tems (GIS) and EO applications, the nature ofraster image data is often multidimensional: itcan include 3-D image time series (x/y/t), 3-D ex-ploration data (x/y/z), and 4-D climate models(x/y/z/t) [9]. Array databases are based on an ar-ray algebra model, and are designed to provide flex-ible and scalable storage, data retrieval, and datamanipulation over large volumes, through a declar-ative query language similar to SQL. Pratical im-plementations of array algebra include AML [10],RAM [16] and RasDaMan [3].

Typically, these systems decompose multidimen-sional arrays into sub-arrays which form the unit ofaccess, which are efficiently indexed and stored in adatabase.

In a typical array algebra there are two distinctoperation categories, namely the m-interval (formulti-dimensional interval) and the array opera-tions. The m-interval operations are functions thatact on the domain of an array, such as slice andtrim. The array operations, are constructor func-tions of arrays that constitute the core of algebra,such as MARRAY (create array), COND (condensearray) and SORT.

1https://support.hdfgroup.org/products/hdf4/2https://support.hdfgroup.org/HDF5/

2.4. Open Geospatial Consortium Standards

The Open Geospatial Consortium (OGC) is an in-ternational industry consortium collaborating tomake quality open standards for the global geospa-tial community. The standards are submitted to aconsensus process and then they are freely availableto the community, to help share geospatial data.Several of these standards are relevant in the con-text of infrastructures dedicated to the storage andprocessing of EO data.

2.4.1 Geography Markup Language

The Geography Markup Language (GML) [12] isthe XML grammar created by the OGC to serveas a core modeling language for GIS, as well as aformat for geographical transactions across the web.

2.4.2 GMLCOV Standard

Formally, the GML Application Scheme for Cover-ages (GMLCOV3) standard [5] is an extension ofthe basic GML coverage primitive, which containsconstituents, such as, coverage domain, rangeSet,rangeType and metadata.

2.4.3 Web Coverage Service

The Web Coverage Service (WCS)4 is a HTTP in-terface, which provides access to raster sources ofgeospatial images in forms that are useful for client-side rendering, as input into scientific models, andfor other clients. The access is made through aserver request, for instance in the form of a UniformResource Locator (URL). The WCS specificationoffers capabilities to extract portions of a coverage,as well as more complex and precise querying [6].Furthermore, a WCS can return valuable metadatathat allows deep analysis, and also supports manyexport formats (e.g. GeoTIFF and netCDF). WCSuses the aforementioned coverage model of the GMLApplication Schema for Coverages [11], which hasbeen developed to facilitate the interchange of cov-erages between OGC services.

The WCS standard supports three kinds of oper-ations that a WCS client can invoke, namely:

GetCapabilities: allows a client to request infor-mation about the server’s capabilities, as wellas valid WCS operations and parameters.

DescribeCoverage: allows a client to request afull description of a coverage in particular.

GetCoverage: allows a client to request a cover-age comprised of a selected range properties ata selected set of spatio-temporal locations, in achosen format.

3http://www.opengis.net/doc/GML/GMLCOV/1.0.14http://www.opengeospatial.org/standards/wcs

2

Page 3: A Prototype Infrastructure for Sentinel Earth … › downloadFile › ...Geospatial Data Infrastructures, Big Data Management 1. Introduction Thanks to the Copernicus Earth Observation

2.4.4 Web Coverage Processing ServiceThe Web Coverage Processing Service (WCPS)5

specification defines a language for the extraction,processing, and analysis of multi-dimensional rastercoverages. While WCS focuses on simple data ac-cess operations, the WCPS query language makesit possible to do more powerful queries.

The WCPS is an expression language similar toXQuery, which is formed by primitives plus nestingcapabilities, and which is independent from any par-ticular request and response encoding, since there isno concrete request/response protocol specified byWCPS [4].

2.4.5 Web Map ServiceThe Web Map Service (WMS)6 standard providesan HTTP interface to request georeferenced dataas images from one or more distributed geospatialdatabases. This norm standardizes the way thatmaps are requested by clients and the way thatservers describe the geodata that they are holding[8]. What distinguishes the WCS from the WMSspecification is the fact that WMS just returns im-ages, typically PNG files, and there is no way to getany metadata.

A WMS supports the following main operations,among optional others:

GetCapabilities: allows a client to request infor-mation about the server’s capabilities, and aswell as valid WMS operations and parameters.

GetMap: allows a client to request a map imagefor a specified area and content.

2.5. Related WorkThe ESA Data Hub Service (DHuS)7 is anopen source8, GPLv3 licensed, web based systemthat establishes the connection between the coreground segment and the users interested in access-ing EO data from the Sentinel programme. Theground segment is the infrastructure that containsall EO data, according to different timelines, rang-ing from near real-time to non time-critical, andavailable typically within 3-24 hours of being sensedby the satellite. DHuS offers to users a selection ofEO products through full text search, which meansthat users do not need to have prior knowledgeabout the data types, the acquisition platform, orsensors. Moreover, it allows users to search for databased on geographical, temporal, and on thematiccriteria. However, in its current version, DHuS doesnot support the extraction of portions of a prod-uct or processing on the data, which constitutes an

5http://www.opengeospatial.org/standards/wcps6http://www.opengeospatial.org/standards/wms7https://scihub.copernicus.eu/dhus8The system can be downloaded in https://github.com/

SentinelDataHub/DataHubSystem

Figure 1: Major functionalities of the IPSentinelprototype.

important limitation that will be addressed in thispaper. There are several array DBMS, such as Ras-DaMan, SciDB[15], and MonetDB/SciQL[18],that can be used to suppress this limitation. All ofthese systems provide very similar array operations,and each system has its own query language, andits approach to storing and managing the arrays.However, RasDaMan is the only system that imple-ments the OGC services (WCS, WMS and WCPS)intended to be integrated into DHuS and is also thesystem that offers the most expressive query lan-guage.

3. Prototype InfrastructureThis section first presents an overview of the IPSen-tinel prototype. Then, presents the software archi-tecture in Subsection 3.2.

3.1. OverviewThe IPSentinel provides a simple web interfaceto allow interactive data discovery, processing anddownload, and multiple Application ProgrammingInterfaces (API) that allows users to access the datavia programs, scripts or client applications.

The major functionalities of IPSentinel areschematically represented in Figure 1, and also de-scribed next.

1. User Interface. This functionality is incharge of providing the user with an interfacefor the discovery, processing, and downloadingof products and for the visualization of the rel-evant metadata. It consists of two set of inter-faces: a set of Graphical User Interfaces (twoweb applications) and a set of Application Pro-

3

Page 4: A Prototype Infrastructure for Sentinel Earth … › downloadFile › ...Geospatial Data Infrastructures, Big Data Management 1. Introduction Thanks to the Copernicus Earth Observation

gramming Interfaces (mainly used for machineto machine interactions and for client applica-tion development).

2. Product Harvesting. The product harvesteris the service responsible for collecting prod-ucts from Payload Data Ground Segment datasources (ingestion) or from a DHuS Network(synchronization).

3. Product Cataloging. The product cata-loging is responsible for the products manage-ment. The product catalog is managed asa rolling archive, with configurable evictionstrategies and rules.

4. Product Search & Dissemination. Thisfunctionality is in charge of providing userswith the possibility to perform search anddissemination via standardized API protocols(OData, Opensearch, WMS and WCS) and viathe graphical user interface.

5. Product Processing. This functionality isresponsible for providing users the capability ofprocessing the available products in the catalogusing the standardized WCPS query language.

3.2. ArchitectureIn the IPSentinel prototype, product discoveryand acquisition is performed automatically by theOData product synchronizer service, which is beingextended in the context of a separate M. Sc. projectby my colleague Francisco [14].

The detailed explanation of the OData productsynchronizer is out of scope of this paper. Suc-cinctly, the OData synchronizer works by callingthe OData API, exposed in the third-party infras-tructures that support it, with the parameters thatrespect the OData protocol. Figure 2 illustrates theinteraction of the IPSentinel prototype with ESASciHub9 in the discovery and acquisition of relevantproducts through the OData synchronizer, and theinteraction of users with IPSentinel. The productstransferred from the SciHub are stored in an direc-tory specified by the system administrator, duringthe configuration of the OData synchronizer.

Petascope is a client module of RasDaMan, in-tegrated to provide IPSentinel users with a dis-tributable service of mirror archives and, processingand dissemination means for EO products. Petas-cope can be deployed on a standard servlets con-tainer as an independent web application. However,it is distributed as a module integrated in the DHuSsoftware to simplify the deployment.

Hereupon users can access IPSentinel function-alities, by means of the following APIs presentedin Figure 3. In the figure, the black lines show the

9https://scihub.copernicus.eu

Figure 2: Overview of interactions of the prototypewith ESA SciHub and users.

Figure 3: Available APIs in the DHuS.

APIs already brought by the DHuS and the red linesshow the APIs implemented by Petascope. ThePetascope module also has an AngularJS10 WebClient, in which offers a graphic user interface thatallows users to create the requests and call the WCSand WMS APIs.

The IPSentinel context diagram is reported in theFigure 4, showing how IPSentinel users:

• access DHuS functionalities by means of DHuScore API, used to access DHuS data storage

• access Petascope specific functionalities bymeans of OGC services, used to:

– access directly to OGC Metadata, con-taining data specific to OGC services

– access/process RasDaMan data

– insert/delete RasDaMan data, by meansof DHuS core API

In order to automatically populate RasDaManwith EO data, some components in the DHuS Corehad to be created and others changed. Figure 5illustrates these components and their connections.The Job Scheduler acts as a job manager, in whichtriggers OData Synchronizers and FileScanners

10https://angularjs.org/

4

Page 5: A Prototype Infrastructure for Sentinel Earth … › downloadFile › ...Geospatial Data Infrastructures, Big Data Management 1. Introduction Thanks to the Copernicus Earth Observation

Figure 4: IPSentinel context diagram.

Figure 5: Component & connector view from theDHuS core.

based on periodicity defined by the system admin-istrator, through the Web Client graphical user in-terface (GUI). Each time the FileScanners arecalled, the directory (represented by the ProductStorage cylinder) is scanned. For each productfound, the system invokes the ProductService toproceed with the product metadata ingestion inthe DHuS database, which in turn invokes theRasDaMan Feeder to process and import the prod-uct to RasDaMan through the Petascope compo-nent.

Figure 6 describes the information processingpipeline of the Rasdaman Feeder, where the roleof each component is as follows:

– The Infer component is responsible to identifythe product mission, level, and type, in orderto be handled with the correct handler.

– The Unzip component deals with file decom-pression.

– The Extractor component extracts all the im-age absolute paths forming part of the product,generates the coverage id and creates the JSONrecipe.

– The gdalwarp, gdal merge and gdal translatetools process the images in order to be sup-portable by RasDaMan.

– Lastly, WCST import performs the ingestion,using the recipe that was previously generated.

4. ValidationThe prototype validation was made using the twoapproaches presented in Subsection 4.1 and Subsec-tion 4.2.

4.1. Requirements ComplianceOne of the methodologies considered for validatingthe prototype involved assessing whether the func-tional requirements are in accordance with the pro-posal. In that sense, the assessment took into ac-count the following functional requirements.Requirement 1: Rolling archive of relevantproducts

This first requirement is related with the capacityof the system to support a rolling archive of relevantproducts that is automatically downloaded by OpenData Protocol (OData) synchronizers.

This requirement is partially fulfilled, since it ispossible to archive all the transferred products withthe respective metadata to be properly stored in thedatabase and indexed in the search engine. Therequirement will be completely accomplished whenthe work of Francisco Silva [14] is finished.Requirement 2: Possibility to search prod-ucts by region, temporal period, and type

This requirement concerns the possibility of find-ing products by specifying a region, selecting theacquisition temporal period, or even the producttype, where these filters can be used separately orsimultaneously.

Without any change to the original DHuS sourcecode this requirement is in conformance because thesoftware originally supported these product searchtypes. Currently, the software version provided byESA only admits the search filter by product typefor missions Sentinel 1 and Sentinel 2.Requirement 3: The access to sentinel prod-ucts through the OGC service interfaces

The third requirement relates to supporting theaccess to the product coverages using the WCSand WMS interfaces, and through the use of theWCPS query language. When selecting the optionto view details of a particular product, in the graph-ical interface, users are presented with two but-tons that allow the navigation to pages of OGC

5

Page 6: A Prototype Infrastructure for Sentinel Earth … › downloadFile › ...Geospatial Data Infrastructures, Big Data Management 1. Introduction Thanks to the Copernicus Earth Observation

Figure 6: Component & connector view with pipe and filter style from the Rasdaman Feeder.

Figure 7: Results returned by a (a) WCS getCov-erage, (b) WCPS query, (c) WMS getMap.

Services. These buttons are only provided if thedescribeCoverage operation call return success,which means that the specific product is availableas a coverage. At the moment the buttons onlyappear in products of GRD type from Sentinel 1.The graphic interface of the aforementioned pagesabstracts the necessary parameters to perform oper-ations over the WCS and WMS services. Thus, theexecution of operations, such as coverage slice, trim,scale, range subset or image displaying can be doneby clicking buttons. Figure 7 (a), presents the re-sult of a getCoverage with parameter rangesubsetassigned with VV and scalefactor assigned with8. Figure 7 (c) shows in OpenLayers the result ofperforming multiple getMap operations on the geo-graphical area of the Madeira island.

Regarding to the WCPS query language, it ispossible to express, and process the EO rasters us-ing simple to complex queries. In Figure 7 (b), ispresent the result from the assessment of the fol-lowing lines of code:

for vv in ( CoverageID VV ) ,vh in ( CoverdadeID VH )

returnencode (

s c a l e ({red : vv ∗ 0 . 9 9 ;green : vh ∗ 0 . 9 9 ;b lue : (abs ( vv ) / abs ( vh ) ) ∗ 0 .99} ,{Lat : ”CRS: 1 ” ( 1 : 5 0 0 ) ,

Long : ”CRS: 1 ” ( 1 : 5 0 0 )} , {}), ”png” )

Through the cases demonstrated by the imagesbelow, this requirement is fully in conformance.Requirement 4: Automatic ingestion ofproducts as coverages, making them imme-diately available through the OGC Servicesinterfaces

This last requirement concerns the automationof the provision of products as coverages in Ras-DaMan, making them accessible through the OGCstandards mentioned on the third requirement.This requirement is assured thanks to the new Ras-daman Feeder component, which has the role of pro-cessing the image properly to be ingested by Ras-DaMan. As explained on Subsection 3.2 the au-tomation is guaranteed by the Job Scheduler, thatdispatches the Product Service that uses the Ras-daman Feeder.

4.2. Measurement of Required Computational Re-sources

Another dimension of the prototype validation in-volved measuring the computational resources re-quired for it to run, namely (1) the expected stor-age space to have the prototype playing his role asa rolling archive, and (2) the time that the systemtakes to process and return a result from the mainoperations of WCS, WMS, and also through theuse of the WCPS query language. All the measure-ments were made on a virtual machine with CentOS7, 16GB RAM, 100GB of disk and an Intel dual core2.10GHz processor.

4.2.1 Storage SpaceThe evaluation of the storage space considered alllevel-1 and level-2 products captured over the Por-tuguese territory from March 4 to 11, 2016 (8 days)by the mission Sentinel 1. All products were trans-ferred from the ESA SciHub. Table 1 presents var-ious information about the transferred data collec-tion. The space occupied by the metadata of therespective products in the database were not taken

6

Page 7: A Prototype Infrastructure for Sentinel Earth … › downloadFile › ...Geospatial Data Infrastructures, Big Data Management 1. Introduction Thanks to the Copernicus Earth Observation

Table 1: Space occupied by Sentinel 1 products on disk.

SatelliteProduct

TypeResolution

SensorMode

Files#

Total Sumof File Size (Mb)

Avg(Mb)

Max(Mb)

Min(Mb)

Sentinel1A GRD H IW 23 18838 819 998 714Sentinel1B GRD H IW 41 32556 794 957 536Sentinel1A GRD M EW 9 2024 225 236 213Sentinel1B GRD M EW 7 1539 220 236 204Sentinel1A SLC n/a IW 31 101600 3629 4600 3300Sentinel1B SLC n/a IW 28 114500 3694 4300 2400Sentinel1A OCN n/a IW 6 39,3 6,6 6,9 6,1Sentinel1B OCN n/a IW 3 19,1 6,4 6,7 6,2

Total 148 271115

into account.From the table it is possible to observe that, on

average, a product of SLC type on IW mode approx-imately occupies on disk 3.6Gb, a product of GRDtype on IW mode approximately occupies 805Mband on EW mode occupies 222Mb, and a product ofOCN type occupies approximately 7Mb. It can alsobe observed that in a period of 8 days 148 productswere captured, which occupy a total of 270Gb (i.e.the equivalent of 33Gb per day). Roughly speakingwe are counting with around 1Tb of products permonth, without including level 0 products or othermissions.

The products in the data collection mentionedabove were imported into RasDaMan, specificallyall products of GRD type in EW mode, and 14products in IW mode. This corresponded to a to-tal of 30 products. Table 2 summarizes informationtaken about the space occupied by these productsin RasDaMan.

Table 2: Space occupied by GRD type products inRasDaMan.

ProductFiles

#Total Sum

of File Size (Mb)Avg(Mb)

Max(Mb)

Min(Mb)

S1A EW GRD 9 2349 261 299 243S1B EW GRD 7 1819 260 318 222S1A IW GRD 7 7290 1041 1131 993S1B IW GRD 7 7054 1008 1113 899

Total 30 18512 643 1131 222

Looking at Tables 2 and 1 we can see thatthe S1* EW GRD products in RasDaMan occupyan average of 17% more than the average of thesize of the original files in the file system, andS1* IW GRD products occupy an average of 27%more. This means that to store also all GRD typeproducts in RasDaMan, the infrastructure requires117% more of disk space for EW mode and 127% forIW mode. Taking into account the above data andconsidering that in one month 36% of the productsare of type GRD in IW mode and 10% are in EWmode, to store them all the infrastructure needs tohave 57% more of its storage capacity, i.e, 570Gbmore.

In conclusion, with this informative data we ar-gue that it is viable to have an infrastructure with1.57Tb of storage space, so that for one month itmay be possible to archive several products underthe conditions described above. However, consider-ing the periodic eviction at the end of 15 days, thestorage space required can be halved. Table 3 sum-marizes the storage space required according to therolling archive plan to store all Level-1 and Level-2products from the Sentinel 1 mission, captured forthe Portuguese territory.

Table 3: Storage space required for different rollingarchive plans.

15 Days 1 Months 3 Months 1 Year

Archive 500 Gb 1 Tb 3 Tb 12 TbRasDaMan 285 Gb 570 Gb 1.71 Tb 6.84 Tb

TotalStorage Space

785 Gb 1.57 4.71 Tb 18.84 Tb

4.2.2 Response TimeRegarding the validation of the prototype responsetime, several scenarios were tested executing dif-ferent operations. The Apache JMeter11 tool, wasused to record the relevant information of each oper-ation of each scenario, in order to draw conclusions.

All scenarios were run only once so that responsetimes were not influenced by existing caches.Scenario 1: Searching for products by a re-gion filter

The goal of this scenario is to test the responsetime of searching for products by a region criterion.This scenario was tested with 4 clients in parallel re-questing products for the regions selected in Figure8.

Contrary to what one might think, selecting alarger area is not synonymous of longer responsetime. As we can see in the Table 4 and Figure8 Client 1 selected an area smaller than Client 3.However, the response time of Client 3 is less thanthe response time of Client 1. The response time

11http://jmeter.apache.org

7

Page 8: A Prototype Infrastructure for Sentinel Earth … › downloadFile › ...Geospatial Data Infrastructures, Big Data Management 1. Introduction Thanks to the Copernicus Earth Observation

Figure 8: Scenario 2: Regions selected by the 4 clients.

is mostly influenced by the number of products agiven region has. This statement can be proved bythe response times and products found in Clients 2and 4.

Table 4: Scenario 1: Response timeRegion Criterion

Selected Region 1 2 3 4Response Time 870 ms 547 ms 368 ms 1.12 sProducts Found 13 8 2 29

Scenario 2: Searching for products by a tem-poral filter

The goal of this scenario is to test the responsetime of searching for products by a temporal cri-terion. This scenario was tested with 4 clients inparallel requesting products of different temporalperiods.

The same conclusions drawn earlier apply to thisscenario, i.e., the response time is mostly influencedby the number of products found in in a given tem-poral interval. As we can see in Figure 5 the re-sponse time increases with the number of productsfound.

Table 5: Scenario 2: Response timeTemporal Criterion

Temporal Period2015/12/312016/12/31

2017/01/012017/03/07

2017/03/072017/04/20

2015/12/312017/04/20

Response Time 471 ms 506 ms 649 ms 868 msProducts Found 8 10 16 29

Comparing Table 5 with the table from Scenario1, for the same number of products found the re-sponse time is higher in Table 4. This differenceis due to the fact that the product search by re-gion is heavier. The operation consists of scanningthe product list for products that have at least onepair of coordinates contained in the selected region.The difference becomes increasingly higher as thenumber of products found grows, as can be seen.Scenario 3: Searching for products by type

The goal of this scenario is to test the responsetime of searching for products by a product typecriterion. This scenario was tested with 4 clientsin parallel requesting products of different producttypes.

In addition to the conclusions already made inthe previous two scenarios that also apply to thisscenario, it is possible to see in Figure 6 the systemspeed to scan the product list in order to create theresponse payload with the products that satisfy thecondition.

Table 6: Scenario 3: Response timeProduct Type Criterion

Product Type RAW GRD SLC ALLResponse Time 346 ms 861 ms 344 ms 884 msProducts Found 1 24 1 26

The first three scenarios are not intended to showhow fast the system is, but rather how it behavesas the number of products increases.Scenario 4: Performing WCS operations

The goal of this scenario is to test the responsetime of describeCoverage and getCoverage

operations from the WCS interface. The scenariowas tested with 4 clients in parallel perform-ing the same operations on 4 different cover-ages, as shown in Table 7. The getCoverage

operation used the following parameters:&RANGESUBSET=VV&SCALEFACTOR=5&FORMAT=image/

png

As expected, the describeCoverage responsetimes shown in the table are similar and low sinceonly the metadata database (Petascope DB) wasqueried. With only the data presented in Table 7it is not possible to conclude how the processingsystem behaves with the increase of getCoverage

requests. However, it is possible to have a clearidea of how fast the Infrastructure prototype is pro-cessing 4 equal operations on different data at thesame time. The S1A IW GRD and S1B IW GRDcoverages are both about 1Gb and have been pro-cessed almost at the same time, i.e. around 30 sec-onds. The same is observed for the S1A EW GRDand S1B EW GRD coverages, which were processedwithin 5 seconds. As expected, the response timeis highly affected by the size of the processed im-age to be transferred. The response time increaseswith the size of the images to be transferred andalso tends to increase with the number of connectedclients transferring images.

8

Page 9: A Prototype Infrastructure for Sentinel Earth … › downloadFile › ...Geospatial Data Infrastructures, Big Data Management 1. Introduction Thanks to the Copernicus Earth Observation

Table 7: Resume of response times of the WCS operations.Coverage S1A IW GRD S1B IW GRD S1A EW GRD S1B EW GRD

describeCoverageResponse Time 338 ms 342 ms 339 ms 369 ms

getCoverageResponse Time 2.7 min 1.7 min 1.0 min 1.1 minProcessing Time 33 s 26 s 4.98 s 3.89 s

Original Size ≈ 1 Gb ≈ 1 Gb ≈ 260 Mb ≈ 260 MbFinal Size 10.7 Mb 9.1 Mb 2.5 Mb 2.4 Mb

Table 8: Summary of response time from different queries.Query 1 2 3 4 5 6 7

Response Time 3.14 s 4.95 s 4.61 s 58.35 s 1.4 min 7.10 s 50.3 minResult Size 283 Kb 125 Kb 155 Kb 8 Kb 442 Kb 385 Kb 2 Kb

Scenario 5: Performing getMap operationsfrom the WMS

This scenario, which was intended to test the re-sponse time of multiple getMap requests executedby OpenLayers, was not done because it was veri-fied that the version of Petascope used in the pro-totype contained severe bugs that had a big impacton memory, making the whole system run out ofphysical memory available when addressing multi-ple getMap operations.

Scenario 6: Processing a query using theWCPS query language

The goal of this scenario is to test the responsetime of executing a processing query by RasDaMan,using the WCPS query language. The scenario wastested with 7 different queries executed in sequencewith different operations on the same coverage. Thecoverage tested is a 2D Level-1 product from theSentinel 1 mission, with dimensions 30032 x 19272px and size 777 Mb. In order to reduce the down-load time of the resulting images, in all the queries,a scale factor was used to reduce the dimensionsand consequently the size.

The Query 1 requested only to return the origi-nal image so that we can see how each query affectsthe original image (Figure 9 (1)). Queries 2 and3, return respectively the VV and VH bands of theoriginal image (Figures 9 (2) and (3)). Query 4 re-turn the NDVI (Normalized Difference VegetationIndex), a measure for the probability of vegetationin remote sensing. This query does not make muchsense in this data, but it has been tested because itis one of the queries most used by scientists (Figure9 (4)). Queries 5 and 6 are basically the same in thesense that they return a falsely colored image (Fig-ures 9 (5) and (6)). However, in the second, a sub-setting is done around the Madeira island. Finally,Query 7 summarizes the frequency with which eachcolor value appears in the original image (Figure 9(7)). The result of this query is a 1D array.

Table 8 summarizes the response times for eachof the queries. The times shown are only processingtimes, i.e., the transfer time is not included. Fromthe response times we can observe that the morearithmetic operations are involved in the queries,the longer the processing time. This statement issupported by comparing the first three queries withQueries 4 and 5. It is also possible to observe fromQueries 5 and 6 that RasDaMan performs queryoptimization to retrieve results faster. The orderin which operations are processed matters. Bothqueries perform the false colouring but in the Query6 there are more things to do, like performing asubsetting and a different scaling. This leads usto think that Query 5 should be faster, however itis not true. RasDaMan first performs the scaling,then the subsetting and finally the false colouring.That is why Query 6 is much faster than Query5. Query 7, at first glance, seems to be too timeconsuming and suggests that RasDaMan is not asfast as they say, but as we look at the Figure 10we realized why there was 50.3 minutes delay. Inthis query a 30032 x 19272 matrix is scanned whereeach element contains two color values. We can seethat only for the color value 0 were made 160592341counts. Thus, the reason for this delay has to dowith the high number of pixels that are traversed,with each pixel having two color channels that haveto be counted.

Finally, looking at the sizes of the files originatedfrom each of the queries it becomes evident the ad-vantage of having server-side image processing. In-stead of having been transferred 777 Mb of datawere only transferred in the total 1.38Mb.

5. Conclusions and Future Work

The IP Sentinel prototype shown that the symbiosisof DHuS and RasDaMan can be achieved with someminor modifications in the DHuS core, allowing fullautomation of making available the DHuS ingested

9

Page 10: A Prototype Infrastructure for Sentinel Earth … › downloadFile › ...Geospatial Data Infrastructures, Big Data Management 1. Introduction Thanks to the Copernicus Earth Observation

Figure 9: Results produced by the queries in theTable 8.

Figure 10: Result produced by the query Query 7.

products in the RasDaMan as coverages. This sym-biosis makes DHuS richer in that it provides server-side processing of the catalogued products throughthe use of OGC services.

From the measurements it was also possible toperceive the viable size of storage space needed tostore the Sentinel 1 products related to the geo-graphic area of Portugal. Also through the mea-surements it was possible to conclude that the pro-cessing on the server side allows to reduce substan-tially the amount of data to be transferred to theclients. This allows reducing the transfer time anddisk space of the infrastructure clients. Althoughit has not been approached in the prototype vali-dation it is also possible to conclude that with thistype of infrastructure it no longer makes sense forthe clients to have advanced computers since theinfrastructure is responsible for the supply of com-puting power.

5.1. Future Work

The developed prototype is still very rough andmuch more needs to be done before it can supporta service. I believe that it will be necessary to adda feature that allows users to add to the DHuS Cat-alog the products processed by RasDaMan.

It will also be interesting to study a strategy sothat the products ingested by RasDaMan are notalso in the data space managed by DHuS, thusavoiding data duplication. It is also necessary toextend the range of possible products to be ingestedby RasDaMan.

References[1] Heiko Balzter, Beth Cole, Christian Thiel,

and Christiane Schmullius. Mapping CORINELand Cover from Sentinel-1A SAR and SRTMDigital Elevation Model Data using RandomForests. Remote Sensing, 7(11), 2015.

[2] Peter Baumann. Management of multidimen-sional discrete data. The Very Large DataBases Journal, 3(4), 1994.

[3] Peter Baumann. A database array algebra forspatio-temporal data and beyond. In Proceed-ings of International Workshops on Next Gen-eration Information Technologies and Systems,1999.

[4] Peter Baumann. OpenGIS Web Coverage Pro-cessing Service (WCPS) Language InterfaceStandard. OGC 08-068r2, Open GeospatialConsortium, 2009.

[5] Peter Baumann. GML Application Schemafor Coverages. OGC 09-146, Open GeospatialConsortium, 2010.

[6] Peter Baumann. OGC WCS 2.0 Interface Stan-dard - Core. OGC 09-110r4, Open GeospatialConsortium, 2012.

[7] Charilaos Christopoulos, Athanassios Skodras,and Touradj Ebrahimi. The JPEG2000 stillimage coding system: an overview. IEEEtransactions on consumer electronics, 46(4),2000.

[8] Jeff de la Beaujardiere. OpenGIS Web MapService (WMS) Implementation Specification.OGC 06-042, Open Geospatial Consortium,2006.

[9] A Garcıa Gutierrez and Peter Baumann.Computing aggregate queries in raster imagedatabases using pre-aggregated data. In Pro-ceedings of International Conference on Com-putational Science and Applications, 2008.

[10] Arunprasad P Marathe and Kenneth Salem.Query processing techniques for arrays. TheVery Large Data Bases Journal, 11(1), 2002.

[11] OGC. The OpenGIS Abstract Specification -Topic 6: Schema for coverage geometry and

10

Page 11: A Prototype Infrastructure for Sentinel Earth … › downloadFile › ...Geospatial Data Infrastructures, Big Data Management 1. Introduction Thanks to the Copernicus Earth Observation

functions. OGC 07-011, Open Geospatial Con-sortium, 2006.

[12] Clemens Portele. OpenGIS GeographyMarkup Language (GML) Encoding Standard.OGC 07-036, Open Geospatial Consortium,2007.

[13] Russ Rew and Glenn Davis. NetCDF: an inter-face for scientific data access. IEEE computergraphics and applications, 10(4), 1990.

[14] Francisco Silva. IPSentinel - Sentinel EarthObservations Rolling Archive Downloader.Technical report, Instituto Superior Tecnico,2016.

[15] Michael Stonebraker, Paul Brown, Alex Poli-akov, and Suchi Raman. The Architecture ofSciDB. In International Conference on Sci-entific and Statistical Database Management,2011.

[16] Alex van Ballegooij. RAM: A Multidimen-sional Array DBMS. In Proceedings of the Ex-tending Database Technology Workshops, 2004.

[17] Wolfgang Wagner. Big data infrastructures forprocessing sentinel data. In Proceeding of thePhotogrammetric Week, 2015.

[18] Ying Zhang, Martin Kersten, and StefanManegold. SciQL: array data processing in-side an RDBMS. In Proceedings of the 2013ACM SIGMOD International Conference onManagement of Data. ACM, 2013.

11