strabon: a semantic geospatial dbms (long version) · 2017. 8. 20. · [1]). strabon extends the...

31
Strabon: A Semantic Geospatial DBMS ? (Long Version) Kostis Kyzirakos, Manos Karpathiotakis, and Manolis Koubarakis National and Kapodistrian University of Athens, Greece {kkyzir,mk,koubarak}@di.uoa.gr Abstract. We present Strabon, a new RDF store that supports the state of the art semantic geospatial query languages stSPARQL and GeoSPARQL. To illustrate the expressive power offered by these query languages and their implementation in Strabon, we concentrate on the new version of the data model stRDF and the query language stSPARQL that we have developed ourselves. Like GeoSPARQL, these new versions use OGC standards to represent geometries where the original versions used linear constraints. We study the performance of Strabon experimen- tally and show that it scales to very large data volumes and performs, most of the times, better than all other geospatial RDF stores it has been compared with. 1 Introduction The Web of data has recently started being populated with geospatial data. A representative example of this trend is project LinkedGeoData where Open- StreetMap data is made available as RDF and queried using the declarative query language SPARQL. Using the same technologies, Ordnance Survey makes available various geospatial datasets from the United Kingdom. The availability of geospatial data in the linked data “cloud” has moti- vated research on geospatial extensions of SPARQL [11,13,20]. These works have formed the basis for GeoSPARQL, a proposal for an Open Geospatial Con- sortium (OGC) standard which is currently at the “candidate standard” stage [1]. In addition, a number of papers have explored implementation issues for such languages [3,4]. In this paper we present our recent achievements in both of these research directions and make the following technical contributions. We describe a new version of the data model stRDF and the query lan- guage stSPARQL, originally presented in [13], for representing and querying geospatial data that change over time. In the new version of stRDF, we use the widely adopted OGC standards Well Known Text (WKT) and Geography Markup Language (GML) to represent geospatial data as literals of datatype strdf:geometry (Section 2). The new version of stSPARQL is an extension of ? This work was supported in part by the European Commission project TELEIOS (http://www.earthobservatory.eu/)

Upload: others

Post on 28-Feb-2021

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

Strabon: A Semantic Geospatial DBMS?

(Long Version)

Kostis Kyzirakos, Manos Karpathiotakis, and Manolis Koubarakis

National and Kapodistrian University of Athens, Greece{kkyzir,mk,koubarak}@di.uoa.gr

Abstract. We present Strabon, a new RDF store that supports thestate of the art semantic geospatial query languages stSPARQL andGeoSPARQL. To illustrate the expressive power offered by these querylanguages and their implementation in Strabon, we concentrate on thenew version of the data model stRDF and the query language stSPARQLthat we have developed ourselves. Like GeoSPARQL, these new versionsuse OGC standards to represent geometries where the original versionsused linear constraints. We study the performance of Strabon experimen-tally and show that it scales to very large data volumes and performs,most of the times, better than all other geospatial RDF stores it hasbeen compared with.

1 Introduction

The Web of data has recently started being populated with geospatial data.A representative example of this trend is project LinkedGeoData where Open-StreetMap data is made available as RDF and queried using the declarativequery language SPARQL. Using the same technologies, Ordnance Survey makesavailable various geospatial datasets from the United Kingdom.

The availability of geospatial data in the linked data “cloud” has moti-vated research on geospatial extensions of SPARQL [11, 13, 20]. These workshave formed the basis for GeoSPARQL, a proposal for an Open Geospatial Con-sortium (OGC) standard which is currently at the “candidate standard” stage[1]. In addition, a number of papers have explored implementation issues forsuch languages [3, 4]. In this paper we present our recent achievements in bothof these research directions and make the following technical contributions.

We describe a new version of the data model stRDF and the query lan-guage stSPARQL, originally presented in [13], for representing and queryinggeospatial data that change over time. In the new version of stRDF, we usethe widely adopted OGC standards Well Known Text (WKT) and GeographyMarkup Language (GML) to represent geospatial data as literals of datatypestrdf:geometry (Section 2). The new version of stSPARQL is an extension of

? This work was supported in part by the European Commission project TELEIOS(http://www.earthobservatory.eu/)

Page 2: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

SPARQL 1.1 which, among other features, offers functions from the OGC stan-dard “OpenGIS Simple Feature Access for SQL” for the manipulation of spatialliterals and support for multiple coordinate reference systems (Section 3). Thenew version of stSPARQL and GeoSPARQL have been developed independentlyat about the same time. We discuss in detail their many similarities and fewdifferences and compare their expressive power.

We present the system Strabon1, an open-source semantic geospatial DBMSthat can be used to store linked geospatial data expressed in stRDF and querythem using stSPARQL. Strabon can also store data expressed in RDF usingthe vocabularies and encodings proposed by GeoSPARQL and query this datausing the subset of GeoSPARQL that is closer to stSPARQL (the union of theGeoSPARQL core, the geometry extension and the geometry topology extension[1]). Strabon extends the RDF store Sesame, allowing it to manage both thematicand spatial RDF data stored in PostGIS. In this way, Strabon exposes a varietyof features similar to those offered by geospatial DBMS that make it one of therichest RDF store with geospatial support available today.

Finally, we perform an extensive evaluation of Strabon using large data vol-umes. We use a real-world workload based on available geospatial linked datasetsas well as a workload based on a synthetic dataset. Strabon can scale up to 500million triples and answer complex stSPARQL queries involving the whole rangeof constructs offered by the language. We evaluated Strabon using the same work-load as our closest competitor implementation in the literature [4]. We presentour findings in Section 5, including a comparison between Strabon on top ofPostgreSQL and a proprietary DBMS, the implementation presented in [4], thesystem Parliament [3], and a naive, baseline implementation. In this way, weare the first to provide a systematic evaluation of RDF stores providing supportfor languages like stSPARQL and GeoSPARQL and pointing out directions forfuture research in this area.

Strabon is currently used in project TELEIOS as the semantic geospatialDBMS that serves two innovative applications of linked geospatial data: a firemonitoring application2 [18] and a virtual observatory for the data archive ofsatellite TerraSAR-X [16].

2 A new version of stRDF

In this section we present a new version of the data model stRDF that wasinitially presented in [13]. The presentation of the new versions of stRDFand stSPARQL (Section 3) is brief since the new versions are based onthe initial ones published in [13]. In [13] we followed the ideas of con-straint databases [17] and chose to represent spatial and temporal data asquantifier-free formulas in the first-order logic of linear constraints. These for-mulas define subsets of Qk called semi-linear point sets in the constraintdatabase literature. In the original version of stRDF, we introduced the

1 http://www.strabon.di.uoa.gr2 http://papos.space.noa.gr/fend_static/

Page 3: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

data type strdf:SemiLinearPointSet for modeling geometries. The valuesof this datatype are typed literals (called spatial literals) that encode geome-tries using Boolean combinations of linear constraints in Q2. For example,

(x ≥ 0 ∧ y ≥ 0 ∧ x+ y ≤ 1) ∨ (x ≤ 0 ∧ y ≤ 0 ∧ x+ y ≥ −1)is such a literal encoding the union of two polygons in Q2. In [13] we also allowthe representation of valid times of triples using the proposal of [7] which addsa fourth component to each triple (its valid time) and proposes to realize it inan RDF store using reification. Valid times are again represented using orderconstraints over the time structure Q. In the rest of this paper we omit the tem-poral dimension of stRDF from our discussion and concentrate on the geospatialdimension only.

Although our original approach in [13] results in a theoretically elegant frame-work, none of the application domains we worked with had geospatial data rep-resented using constraints. Today’s GIS practitioners represent geospatial datausing OGC standards such as WKT and GML. Thus, in the new version ofstRDF and stSPARQL, which has been used in EU projects SemsorGrid4Envand TELEIOS, the linear constraint representation of spatial data was droppedin favour of OGC standards. As we demonstrate below, introducing OGC stan-dards in stRDF and stSPARQL has been achieved easily without changing any-thing from the basic design choices of the data model and the query language.

In the new version of stRDF, the datatypes strdf:WKT and strdf:GML areintroduced to represent geometries serialized using the OGC standards WKTand GML. WKT is a widely accepted OGC standard3 and can be used for rep-resenting geometries, coordinate reference systems and transformations betweencoordinate reference systems. A coordinate system is a set of mathematical rulesfor specifying how coordinates are to be assigned to points. A coordinate ref-erence system (CRS) is a coordinate system that is related to an object (e.g.,the Earth, a planar projection of the Earth) through a so-called datum whichspecifies its origin, scale, and orientation. Geometries in WKT are restrictedto 0-, 1- and 2-dimensional geometries that exist in R2, R3 or R4. Geometriesthat exist in R2 consist of points with coordinates x and y, e.g., POINT(1,2).Geometries that exist in R3 consist of points with coordinates x, y and z orx, y and m where m is a measurement. Geometries that exist in R4 consist ofpoints with coordinates x, y, z and m. The WKT specification defines syntax forrepresenting the following classes of geometries: points, line segments, polygons,triangles, triangulated irregular networks and collections of points, line segmentsand polygons. The interpretation of the coordinates of a geometry depends onthe CRS that is associated with it.

GML is an OGC standard4 that defines an XML grammar for modeling,exchanging and storing geographic information such as coordinate reference sys-tems, geometries and units of measurement. The GML Simple Features specifi-cation (GML-SF) is a profile of GML that deals only with a subset of GML anddescribes geometries similar to the one defined by WKT.

3 http://portal.opengeospatial.org/files/?artifact_id=253554 http://portal.opengeospatial.org/files/?artifact_id=39853

Page 4: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

Given the OGC specification for WKT, the datatype strdf:WKT5 is definedas follows. The lexical space of this datatype includes finite-length sequences ofcharacters that can be produced from the WKT grammar defined in the WKTspecification, optionally followed by a semicolon and a URI that identifies thecorresponding CRS. The default case is considered to be the WGS84 coordinatereference system. The value space is the set of geometry values defined in theWKT specification. These values are a subset of the union of the powersets ofR2 and R3. The lexical and value space for strdf:GML are defined similarly.The datatype strdf:geometry is also introduced to represent the serializationof a geometry independently of the serialization standard used. The datatypestrdf:geometry is the union of the datatypes strdf:WKT and strdf:GML, andappropriate relationships hold for its lexical and value spaces.

Both the original [13] and the new version of stRDF presented in this paperimpose minimal new requirements to Semantic Web developers that want torepresent spatial objects with stRDF; all they have to do is utilize a new literaldatatype. These datatypes (strdf:WKT, strdf:GML and strdf:geometry) canbe used in the definition of geospatial ontologies needed in applications, e.g.,ontologies similar to the ones defined in [20].

In the examples of this paper we present stRDF triples that come from afire monitoring and burnt area mapping application of project TELEIOS. Theprefix noa used refers to the namespace of relevant vocabulary6 defined by theNational Observatory of Athens (NOA) for this application.

Example 1. stRDF triples derived from GeoNames that represent informationabout the Greek town Olympia including an approximation of its geometry. Also,stRDF triples that represent burnt areas.

geonames:26 rdf:type dbpedia:Town. geonames:26 geonames:name "Olympia".

geonames:26 strdf:hasGeometry "POLYGON((21 18,23 18,23 21,21 21,21 18));

<http://www.opengis.net/def/crs/EPSG/0/4326>"^^strdf:WKT.

noa:BA1 rdf:type noa:BurntArea;

strdf:hasGeometry "POLYGON((0 0,0 2,2 2,2 0,0 0))"^^strdf:WKT.

noa:BA2 rdf:type noa:BurntArea;

strdf:hasGeometry "POLYGON((3 8,4 9,3 9,3 8))"^^strdf:WKT.

Features can, in general, have many geometries (e.g., for representing the fea-ture at different scales). Domain modelers are responsible for their appropriateutilization in their graphs and queries.

3 A new version of stSPARQL

In this section we present a new version of the query language stSPARQL thatwe originally introduced in [13]. The new version is an extension of SPARQL1.1 with functions that take as arguments spatial terms and can be used in the

5 http://strdf.di.uoa.gr/ontology6 http://www.earthobservatory.eu/ontologies/noaOntology.owl

Page 5: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

SELECT, FILTER, and HAVING clause of a SPARQL 1.1 query. A spatial term iseither a spatial literal (i.e., a typed literal with datatype strdf:geometry orits subtypes), a query variable that can be bound to a spatial literal, the resultof a set operation on spatial literals (e.g., union), or the result of a geometricoperation on spatial terms (e.g., buffer).

In stSPARQL we use functions from the “OpenGIS Simple Feature Access- Part 2: SQL Option” standard (OGC-SFA)7 for querying stRDF data. Thisstandard defines relational schemata that support the storage, retrieval, queryand update of sets of simple features using SQL.

A feature is a domain entity that can have various attributes that describespatial and non-spatial (thematic) characteristics. The spatial characteristics ofa feature are represented using geometries such as points, lines, polygons, etc.Each geometry is associated with a CRS. A simple feature is a feature with allspatial attributes described piecewise by a straight line or a planar interpolationbetween sets of points. The OGC-SFA standard defines functions for requestinga specific representation of a geometry (e.g., the function ST_AsText returnsthe WKT representation of a geometry), functions for checking whether somecondition holds for a geometry (e.g., the function ST_IsEmpty returns true if ageometry is empty) and functions for returning some properties of the geometry(e.g., the function ST_Dimension returns its inherent dimension). In addition,the standard defines functions for testing named spatial relationships betweentwo geometries (e.g., the function ST_Overlaps) and functions for constructingnew geometries from existing geometries (e.g., the function ST_Envelope thatreturns the minimum bounding box of a geometry).

The new version of stSPARQL extends SPARQL 1.1 with the ma-chinery of the OGC-SFA standard. We achieve this by defining a URIfor each of the SQL functions defined in the standard and use themin SPARQL queries. For example, for the function ST IsEmpty definedin the OGC-SFA standard, we introduce the SPARQL extension function

xsd:boolean strdf:isEmpty(strdf:geometry g)

which takes as argument a spatial term g, and returns true if g is the emptygeometry. Similarly, we have defined a Boolean SPARQL extension function foreach topological relation defined in OGC-SFA (topological relations for sim-ple features), [6] (Egenhofer relations) and [5] (RCC-8 relations). In this waystSPARQL supports multiple families of topological relations our users mightbe familiar with. Using these functions stSPARQL can express spatial selec-tions, i.e., queries with a FILTER function with arguments a variable and a con-stant (e.g., strdf:contains(?geo, "POINT(1 2)"^^strdf:WKT)), and spatialjoins, i.e., queries with a FILTER function with arguments two variables (e.g.,strdf:contains(?geoA, ?geoB)).

The stSPARQL extension functions can also be used in the SELECT clause ofa SPARQL query. As a result, new spatial literals can be generated on the flyduring query time based on pre-existing spatial literals. For example, to obtain

7 http://portal.opengeospatial.org/files/?artifact_id=25354

Page 6: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

the buffer of a spatial literal that is bound to the variable ?geo, we would usethe expression SELECT (strdf:buffer(?geo,0.01) AS ?geobuffer).

In stSPARQL we have also introduced the following three aggregate functionsthat deal with geospatial data:

– strdf:geometry strdf:union(set of strdf:geometry a), returns a ge-ometry that is the union of the set of input geometries.

– strdf:geometry strdf:intersection(set of strdf:geometry a),returns a geometry that is the intersection of the set of input geometries.

– strdf:geometry strdf:extent(set of strdf:geometry a), returns ageometry that is the minimum bounding box of the set of input geometries.

stSPARQL also supports update operations (insertion, deletion, and updateof stRDF triples) conforming to the declarative update language for SPARQL,SPARQL Update 1.1, which is a current proposal of W3C.

The following examples demonstrate the functionality of stSPARQL.

Example 2. Return the names of towns that have been affected by fires.

SELECT ?name

WHERE { ?t a dbpedia:Town.

?t geonames:name ?name.

?t strdf:hasGeometry ?tGeo.

?ba a noa:BurntArea.

?ba strdf:hasGeometry ?baGeo.

FILTER(strdf:intersects(?tGeo,?baGeo))}

The query above demonstrates how to use a topological function in a query. Theresults of this query are the names of the towns whose geometries “spatiallyoverlap” the geometries corresponding to areas that have been burnt.

Example 3. Isolate the parts of the burnt areas that lie in coniferous forests.

SELECT ?ba (strdf:intersection(?baGeom,strdf:union(?fGeom)) AS ?burnt)

WHERE { ?ba a noa:BurntArea. ?ba strdf:hasGeometry ?baGeom.

?f a noa:Area. ?f noa:hasLandCover noa:ConiferousForest.

?f strdf:hasGeometry ?fGeom.

FILTER(strdf:intersects(?baGeom,?fGeom)) }

GROUP BY ?ba ?baGeom

The query above tests whether a burnt area intersects with a coniferous for-est. If this is the case, groupings are made depending on the burnt area. Thegeometries of the forests corresponding to each burnt area are unioned, andtheir intersection with the burnt area is calculated and returned to the user.Note that only strdf:union is an aggregate function in the SELECT clause;strdf:intersection performs a computation involving the result of the aggre-gation and the value of ?baGeom which is one of the variables determining thegrouping according to which the aggregate computation is performed.

This section completes our brief introduction to stRDF and stSPARQL. Moredetails are given in [15]. Section 6 compares stSPARQL with the recent OGCcandidate standard GeoSPARQL.

Page 7: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

PostGIS

Query Engine

Parser

Optimizer

Evaluator

Transaction

Manager

Storage Manager

Repository

SAIL

RDBMS

Fig. 1: The architecture of Strabon

4 Implementation

In this section we present Strabon 3.0, a fully-implemented, open-source, storageand query evaluation system for stRDF/stSPARQL and the corresponding sub-set of GeoSPARQL. Our discussion here concentrates on stSPARQL only, butgiven the similarities with GeoSPARQL to be discussed in Section 6, the appli-cability to GeoSPARQL is immediate. The first version of Strabon was brieflypresented in [19] as an implementation of the original version of stRDF andstSPARQL. As we show in this and the next section, Strabon 3.0 is a matureimplementation of a geospatial RDF store.

Strabon has been implemented by extending the widely-known RDF storeSesame. Although Sesame is not the most efficient RDF store available, we de-cided to base our implementation on it because of its open-source nature, layeredarchitecture, wide range of functionalities and the ability to have PostGIS, a‘spatially enabled’ DBMS, as a backend to exploit its variety of spatial functionsand operators. Strabon extends Sesame to manage both thematic and spatialRDF data that are subsequently stored in PostGIS. This is done by creating alayer that is included in Sesame’s software stack in a transparent way so that itdoes not affect Sesame’s range of functionalities, while at the same time bene-fits by new versions of Sesame. Strabon 3.0 uses Sesame 2.6.3. Strabon followsSesame’s modular architecture and comprises three modules: the storage man-ager, the query engine and PostGIS. The architecture of Strabon is depicted inFigure 1 and its modules are presented in Sections 4.1 and 4.2.

Page 8: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

4.1 Data Storage

When a user wants to store stRDF data in Strabon, she makes them availablein the form of an RDF document. The document is decomposed into stRDFtriples and each triple is processed separately. Data is stored using dictionaryencoding, i.e., each URI, literal or blank node is encoded by a unique integer sothat storage and query evaluation is performed efficiently using these integers.

Sesame offers two schemes for storing the dictionary-encoded triples: a“monolithic” and a “per-predicate” scheme. According to the former scheme, ev-ery triple is stored in a single triple table. The schema of this table is triple(subj idint, pred id int, obj id int) and each tuple in it has a subj id (resp. pred id,obj id) that is the unique encoding of the triple’s subject (resp. predicate, ob-ject). The “per-predicate” scheme uses vertical partitioning to store triples indifferent tables based on their predicate, and one table per predicate is main-tained. For example, the schema of a predicate table that corresponds to therdf:type property would be type(subj id int, obj id int). The literature containsdetailed comparisons of the vertical partitioning approach vs. the single-tableapproach in row stores such as PostgreSQL and another closed-source commer-cial system, and column stores such as C-Store and MonetDB [2, 22]. Based onthese studies and our own experience with PostgreSQL, we decided to adopt thevertical partitioning approach since it is more often the most efficient methodamong the two. We build two B-Tree two-column indices on subj id , obj id andobj id , subj id respectively for each predicate table.

The mapping dictionary containing the association between unique ids andRDF resources is further decomposed into multiple pre-defined tables. Each ofthese tables is dedicated to storing mappings of a specific type of RDF resources.For example, the table bnode values(id int, value varchar) stores the encodingsfor the blank nodes and a B-tree index is constructed on the id column.

If a spatial literal is found during data loading, Strabon deviates from thedefault behavior of Sesame as follows. All spatial literals are also stored in atable with schema geo values(id int, value geometry, srid int). Each tuple inthe geo values table has an id that is the unique encoding of the spatial literalbased on the mapping dictionary. The attribute value is a spatial column whosedata type is the PostGIS type geometry and is used to store the geometry thatis described by the spatial literal. The geometry is transformed to a uniform,user-defined CRS and the original CRS is stored in the attribute srid. By usinga single CRS, any operation involving geometries initially encoded in differentCRS can be performed in a single step. Otherwise, the geometries should beconverted to a common CRS prior to the operation execution. This approachincurs an overhead during data loading and requires that geometries bound tovariables appearing in the SELECT clause need to be transformed to the originalCRS before returning them to the user. Additionally, an R-tree-over-GiST [9]spatial index on the value column is created.

Sesame lacks a mechanism for bulk loading of large datasets, since its func-tionality focuses on atomic operations. As a result, its performance in the case ofloading RDF datasets of extensive size was not satisfying for our purposes. Thus,

Page 9: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

we have also implemented an stRDF loader which virtualizes a storing schemeidentical to Sesame’s. This loader creates the dictionary (including the mappingof every RDF term) in main memory. If the “monolithic” storing scheme is fol-lowed (resp. “per-predicate”), the loader flushes every row of the triple table(resp. each predicate table) to a single CSV file as soon as it is created. Afterparsing the RDF file, the loader flushes the dictionary to CSV files as well. Af-terwards, these CSV files are loaded and indexed into PostgreSQL. As we showin Section 5, this loader enables us to test our system with datasets of up to 500million triples, provided that the mapping dictionary fits in main memory.

4.2 Query Processing and Optimization

Query processing in Strabon is performed by the query engine that consists of aparser, an optimizer, an evaluator and a transaction manager. The parser and thetransaction manager are identical to the ones in Sesame. The optimizer and theevaluator have been implemented by modifying the corresponding componentsof Sesame as we describe below.

The query engine works as follows. First, the parser generates an abstractsyntax tree. Then, this tree is mapped to the internal algebra of Sesame, re-sulting in a query tree. In the query tree, non-leaf nodes are algebraic operatorssuch as joins and leaf nodes are the variables present in the triple patterns of theinitial query. Using functional notation, Figure 9a presents a textual representa-tion of the query tree corresponding to the query of Example 2. The informationstored in each node of the tree is represented by key-value pairs in the figure.For example, the predicate strdf:hasGeometry of the third triple pattern ofExample 2 is represented by line 19 of Figure 9a as a VAR leaf node with at-tributes name and value. At this stage, the query tree is a plain translation ofthe initial query to the algebra of Sesame. The query tree is then processed bythe optimizer that progressively restructures and enriches it, implementing thevarious optimization techniques of Strabon. Afterwards, the query tree is passedto the evaluator to produce the corresponding SQL query that will be evaluatedby PostgreSQL. After the SQL query has been posed, the evaluator receivesthe results and performs any post-processing actions needed. If a user-definedextension function is present, it is evaluated over the complete result. If an ag-gregate function is present in the HAVING clause, the initial bindings retrievedare grouped according to the variables present in the GROUP BY clause and theaggregate function is evaluated to filter these groups. The final step involvesformatting the results to be returned to the user. Besides the standard formatsoffered by RDF stores (e.g., the SPARQL Query Results XML Format8), Stra-bon offers KML and GeoJSON encodings, which are widely used in the mappingindustry.

We will now present in more detail how the optimizer restructures/enrichesthe query tree and how this restructuring/enriching affects the produced SQLquery. Initially, the optimizer works exactly as the one of Sesame and deals with

8 http://www.w3.org/TR/rdf-sparql-XMLres/

Page 10: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

the standard SPARQL part of an stSPARQL query. First, it applies traditionaltechniques such as pushing down FILTER expressions in the query tree in orderto be evaluated as early as possible to minimize the size of intermediate results.Then, for each triple pattern with a given predicate, the name of the correspond-ing predicate table and its columns that are referenced by the query are retrievedand stored in the node of the query tree representing the triple pattern. If twotriple patterns share a variable, the corresponding nodes in the query tree willbe connected via a node that represents the equi-join between them. In the casewhere two graph patterns do not share any variable, a node that represents theirCartesian product will be present in the query tree. Finally, the necessary ta-bles storing the mapping dictionary are incorporated in the query tree to allowthe reconstruction of the actual RDF resources comprising the query results.Figure 9b depicts the query tree corresponding to the query of Example 2 atthis stage. Notice that all necessary information about the predicate tables andtheir columns that are needed for the evaluation of the query has now beenincorporated in the query tree. For example, the triple pattern with predicatestrdf:hasGeometry is mapped to the predicate table hasgeometry (line 15)and the columns subj and obj are incorporated in the query tree (lines 17 and42).

By default, Sesame uses the following query evaluation strategy for eval-uating user-defined extension functions: all bindings for the variables presentin the query are first retrieved, and then the extension functions are evalu-ated to discard the tuples not satisfying them. The query tree depicted in Fig-ure 9b is constructed following this strategy. Note that the extension functionstrdf:intersects is nested outside the node SelectQuery. Therefore, whenthe query tree is passed to Sesame’s evaluator, the SQL query presented in Fig-ure 8b is generated. Notice that no spatial function has been incorporated in thequery, thus Sesame will evaluate the spatial function strdf:intersects overthe complete result set corresponding to the triple patterns of the query of Ex-ample 2, instead of using the function to possibly restrict the size of the resultset.

In Strabon, we introduce an additional optimization step to incorporate anystSPARQL extension functions present in the query into the query tree priorto its transformation to SQL. As we have just explained, this is motivated bythe fact that it would be unnecessarily costly to evaluate the whole query andthen perform a post-processing step to the results in order to filter out unwantedtuples. Thus, we want all spatial expressions present in the SELECT and FILTER

clause of an stSPARQL query to be evaluated using PostGIS spatial functions in-stead of relying on external libraries that would add an unneeded post-processingcost. To realize the optimization step, the geo values table is first incorporatedin the query tree. Also, the spatial functions present in the query tree are set tobe evaluated by PostGIS and it is declared that their arguments will be retrievedfrom the table geo values. Figure 9c depicts the query tree that is produced bythe enhanced optimizer for the query of Example 2. Notice that the spatial func-

Page 11: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

tion strdf:intersects is now represented by node SqlIntersects and has aschildren nodes the two geometry columns coming from the geo values table.

After this optimization step, the query tree is passed to Sesame’s evalua-tor which has been enhanced to receive the modified query tree and create aSQL query. During this step, the spatial operators are mapped to the equiva-lent built-in functions and operators of PostGIS. Then, the evaluator executesthe produced SQL query in PostGIS. Values retrieved from geo values will bejoined with the rest of the query via the id assigned to each spatial objectduring the dictionary encoding phase. Figure 8c depicts the SQL query that isproduced from the stSPARQL query of Example 2. Notice that the spatial func-tion strdf:intersects has now been mapped to the ST Intersects functionof PostGIS.

The last optimization step makes the underlying DBMS aware of the ex-istence of spatial joins in stSPARQL queries so that they would be evalu-ated efficiently. Let us consider the query of Example 2 again. The first threetriple patterns of the query are related to the rest via the topological functionstrdf:intersects (this is what the relational spatial database literature callsa spatial join). As we can see, both query trees presented in Figures 9b and9c fail to treat this join appropriately. The second occurrence of the predicatetable type is joined with the rest of the triple patterns via a node that has aschildren nodes a RefIdColumn and Number node. This will give rise to a Carte-sian product since equi-joins are produced only from SqlEq nodes that have twoRefIdColumn children nodes that represent relational columns. Thus, the evalu-ation of a spatial predicate will be wrongly postponed after the calculation of aCartesian product.

To implement this last optimization step, we introduce one more internalrepresentation of an stSPARQL query using the concept of a query graph [8]. Aquery graph G for an stSPARQL query q is a pair (V,E), where V is the set ofnodes in G, and E is the set of undirected edges in G. Each node in V denotesa triple pattern of q. Two nodes from V are connected with an edge in E if andonly if the triple patterns of q represented by these two nodes either share avariable (thematic join or spatial equi-join) or a variable from each pattern isrelated to each other through a topological or distance function appearing in theFILTER clause of q (spatial θ-join). Figure 8a depicts the query graph that isproduced by the Strabon optimizer for the stSPARQL query of Example 2. Thenumbers labelling each node denote triple patterns of the query while the dottededge corresponds to the spatial join. The constructed query graph is traversed bythe optimizer so that the query tree can be restructured to take into account thejoins present in an stSPARQL query. For the query of Example 2, the optimizeridentifies the spatial join between the third and fifth triple pattern, re-orders thenodes of the query tree, and introduces the node SqlIntersects that realizes thespatial join. The result is the query tree of Figure 9d, that is mapped to the SQLquery of Figure 8d. The Cartesian product has now been converted into a θ-joinwhere θ is the spatial function ST Intersects. Without this modification, allspatial joins would be ignored and Cartesian products would possibly be present

Page 12: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

in the final query plan. This situation is very probable because we often haveSPARQL queries that query different datasets and combine them only thoughtheir spatial dimension.

To quantify the gains of the optimization techniques used in Strabon, wehave also developed a naive, baseline implementation of stSPARQL which hasnone of the optimization enhancements to Sesame discussed above. For exam-ple, regarding user-defined functions, the naive implementation uses the defaultSesame strategy where all the bindings for the variables present in the queryare first retrieved, and then spatial functions are evaluated to discard the tuplesnot satisfying them. The naive implementation is a complete implementationof stSPARQL which uses the native store of Sesame as a backend instead ofPostGIS (since the native store outperforms all other Sesame implementationsusing a DBMS). Data is stored on disk using atomic operations and indexedusing B-Tree indices. The naive implementation is essentially the easiest wayto implement stSPARQL in Sesame, thus it is interesting to compare Strabonagainst it.

5 Experimental Evaluation

This section presents a detailed evaluation of the system Strabon using two dif-ferent workloads: a workload based on linked data and a synthetic workload. Forboth workloads, we compare the response time of Strabon on top of PostgreSQL(which we will call Strabon PG from now on) ith our closest competitor imple-mentation in [4], the naive, baseline implementation described in Section 4.2, andthe RDF store Parliament9. In order to identify potential benefits from using adifferent DBMS as a relational backend for Strabon, we also executed the SQLqueries produced by Strabon in a proprietary spatially-enabled DBMS (whichwe will call System X from now on and Strabon X the resulting combination).

[4] presents an implementation which enhances the RDF-3X triple store withthe ability to perform spatial selections using an R-tree index. The implemen-tation of [4] is not a complete system like Strabon and does not support afull-fledged query language such as stSPARQL. In addition, the only way toload data in the system is the use of a generator which has been especially de-signed for the experiments of [4] thus it cannot be used to load other datasetsin the implementation. Moreover, the geospatial indexing support of this imple-mentation is limited to spatial selections. Spatial selections are pushed down inthe query tree (i.e., they are evaluated before other operators). Parliament is anRDF storage engine recently enhanced with GeoSPARQL processing capabilities[3], which is coupled with Jena to provide a complete RDF system.

Our experiments were carried out on an Ubuntu 11.04 installation on anIntel Xeon E5620 with 12MB L2 cache running at 2.4 GHz. The system has16GB of RAM and 2 disks of striped RAID (level 0). Each disk has 32MBcache and its rotational speed is 7200 rpm. We measured the response time for

9 http://parliament.semwebcentral.org/

Page 13: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

each query posed by measuring the elapsed time from query submission till acomplete iteration over the results had been completed. We ran all queries fivetimes on cold and warm caches. For warm caches, we ran each query once beforemeasuring the response time, in order to warm up the caches.

5.1 Evaluation using Linked Data

This section describes the experiments we did for the evaluation of Strabon usinga workload based on linked data. We selected published datasets that are pop-ular and include spatial information. We also published previously unavailabledatasets with rich spatial information to discover interconnections with otherdatasets based on their spatial dimension.

We created a unified dataset10 that consists of the following popular datasetsthat include geospatial information:

– DBpedia11, consisting of datasets extracted from Wikipedia.– GeoNames12, a gazetteer that collects both spatial and thematic information

for various placenames around the world.– LinkedGeoData13, the dataset derived from publishing OpenStreetMap14

data as linked data.– Pachube15, now renamed to cosm, a dataset derived from a platform enabling

users to publish sensor data on the Web or use existing sensor data in orderto build their own applications on top of it.

– SwissExperiment16, a dataset derived from a platform focused on environ-mental monitoring and experiments with the use of sensors (in this case,wireless sensor networks). The Pachube and SwissEx datasets are modelledin a very similar manner since they mostly reuse the same ontologies.

We also used the following datasets, published in RDF by us in the context ofthe EU projects TELEIOS and SemsorGrid4Env.

– Corine Land Use/Land Cover17, is a dataset of the European EnvironmentAgency that characterizes data regarding the land cover of European coun-tries. The project uses a hierarchical scheme with three levels to describeland cover. The first level indicates the major categories of land cover on theplanet, e.g., forests and semi-natural areas. The second level identifies morespecific types of land cover, e.g., forests, while the third level narrows downto a very specific characterization, e.g., coniferous forests.

10 All datasets used for the evaluation of Strabon can be found at http://www.strabon.di.uoa.gr/evaluation/datasets.html

11 http://dbpedia.org/About12 http://www.geonames.org/13 http://linkedgeodata.org/About14 http://www.openstreetmap.org/15 https://cosm.com16 http://www.swiss-experiment.ch17 http://www.eea.europa.eu/publications/COR0-landcover

Page 14: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

Dataset Size Triples Spatialterms

Distinctspatialterms

Points Linestrings(min/ max/avg # ofpoints/linestring)

Polygons(min/max/avg# of points/polygon)

DBpedia 7.1 GB 58,727,893 386,205 375,087 375,087 - -

GeoNames 2.1GB 17,688,602 1,262,356 1,099,964 1,099,964 - -

LGD 6.6GB 46,296,978 5,414,032 5,035,981 3,205,015 353,714 (4/20/9) 1,704,650(4/20/9)

Pachube 828KB 6,333 101 70 70 - -

SwissEx 33MB 277,919 687 623 623 - -

CLC 14GB 19,711,926 2,190,214 2,190,214 - - 2,190,214(4/1,255,917/129)

GADM 146MB 255 51 51 - - 51 (96/ 510,018/79,831)

Fig. 2: Summary of unified linked datasets

– Global Administrative Areas18, a dump of the spatial database contain-ing the locations of administrative areas on a global extent. The areas con-tained in GADM are of various granularities; one can choose from countries,states, municipalities, etc. We isolated a dataset describing the borders ofEuropean countries.

The size of the unified dataset is 30GB and consists of 137 million triples thatinclude 176 million distinct RDF terms of which 9 million are spatial literals. Aswe can see in Table 2, the complexity of the spatial literals varies significantly.For more information on the datasets mentioned above and the process followed,the reader may consult [10]. It should be noted that we could not use the imple-mentation of [4] to execute this workload since it is not a complete system likeStrabon as we explained above.Storing stRDF Documents. For our experimental evaluation, we used thebulk loader described in Section 4.1 emulating the “per-predicate” scheme sincepreliminary experiments indicated that the query response times in PostgreSQLwere significantly faster when using this scheme compared to the “monolithic”scheme. This decision led to the creation of 48,405 predicate tables. Figure 3bpresents the time required by each system to store the unified dataset.

Strabon PG required 9,664 seconds to pre-process the unified RDF file usingthe loader, 3,909 seconds to store the CSV files and 9,970 seconds to populatethe indices in PostgreSQL. The naive implementation required 9,480 seconds intotal. The difference in times between Strabon and the naive implementation isnatural given the very large number of predicate tables that had to be produced,processed and indexed. In this dataset, 80% of the total triples used only 17distinct predicates. The overhead imposed for the processing of the rest rarelyused 48,388 predicates cancels the performance benefits of the bulk loader. Ahybrid solution that stores triples with popular predicates in separate tableswhile storing the rest of the triples in a single table, would have been moreappropriate for balancing the tradeoff between storage time and query responsetime but we have not experimented with this option. In Section 5.2 we show

18 http://www.gadm.org/

Page 15: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

that the loader scales much better when fewer distinct predicates are present inour dataset, regardless its size.

In the case of Strabon X, we followed an identical process with the one fol-lowed for Strabon PG. System X required double the amount of time that it tookPostgreSQL to store and index the data. In the case of Parliament we modifiedthe dataset to conform to the GeoSPARQL vocabulary and measure the timerequired for storing the resulting file. After incorporating the additional triplesrequired by GeoSPARQL, the resulting dataset had approximately 15% moretriples than the original RDF file. Both Strabon and the naive implementationoutperformed Parliament.

Evaluating stSPARQL Queries. In this experiment we chose to evaluateStrabon using eight real-world queries. Our intention was to have enough queriesto demonstrate Strabon’s performance and functionality based on the followingcriteria: query patterns should be frequently used in Semantic Web applicationsor demonstrate the spatial extensions of Strabon. The first criterion was ful-filled by taking into account the results of [21] when designing the queries. [21]presents statistics on real-world SPARQL queries based on the logs of DBpedia.In addition, we studied the query logs of the LinkedGeoData SPARQL endpoint(which were kindly provided to us) to determine what kind of spatial queries arepopular. According to this criterion, we composed the queries Q1, Q2 and Q7.According to [21] queries like Q1 and Q2 that consist of a few triple patternsare very common. Query Q7 consists of a few triple patterns and a topologi-cal FILTER function, which is a structure similar to the queries most frequentlyposed to the LGD endpoint. The second criterion was fulfilled by incorporat-ing spatial selections (Q4 and Q5) and spatial joins (Q3,Q6,Q7 and Q8) in thequeries. In addition, we used non-topological functions that create new geome-tries from existing ones (Q4 and Q8) since these functions are frequently usedin geospatial relational databases.

In Table 3a we report the response time results. In queries Q1 and Q2,the baseline implementation outperforms all other systems since these queriesdo not include any spatial function. As mentioned earlier, the native store ofSesame outperforms Sesame implementations on top of a DBMS. All other sys-tems produce comparable result times for these non-spatial queries. In all otherqueries the DBMS-based implementations outperform Parliament and the naiveimplementation. For the rest of the queries, Strabon PG outperforms the naiveimplementation since it incorporates the two stSPARQL-specific optimizationsdiscussed in Section 4. As a result, spatial operations are evaluated by PostGISusing a spatial index, instead of being evaluated after all the results have beenretrieved. The naive implementation and Parliament fail to execute Q3 as thisquery involves a spatial join that is very expensive for systems using a naiveapproach. The only exception in the behavior of the naive implementation is Q4in the case of warm caches, where the non-spatial part of the query producesvery few results and the file blocks needed for query evaluation are cached inmain memory. In this case, the non-spatial part of the query is executed rapidlywhile the evaluation of the spatial function over the results thus far is not sig-

Page 16: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

Caches System Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8

Cold Naive 0.08 1.65 >8h 28.88 89 170 844 1.699(sec.) Strabon-PG 2.01 6.79 41.39 10.11 78.69 60.25 9.23 702.55

Strabon-X 1.74 3.05 1623.57 46.52 12.57 2409.98 >8h 57.83Parliament 2.12 6.46 >8h 229.72 1130.98 872.48 3627.62 3786.36

Warm Naive 0.01 0.03 >8h 0.79 43.07 88 708 1712(sec.) Strabon-PG 0.01 0.81 0.96 1.66 38.74 1.22 2.92 648.1

Strabon-X 0.01 0.26 1604.9 35.59 0.18 3196.78 >8h 44.72Parliament 0.01 0.04 >8h 10.91 358.92 483.29 2771 3502.53

(a)

Total time (sec)System Linked

Data10mil 100mil 500mil

Naive 9,480 1,053 12,305 72,433

Strabon PG 19,543 458 3,241 21,155

Strabon X 28,146 818 7,274 40,378

RDF-3X * 780 8,040 43,201

Parliament 81,378 734 6,415 >36h

(b)

Fig. 3: (b) Storage time for each dataset (a) Response time for real-world queries

nificant. All queries except Q8 are executed significantly faster when run usingStrabon PG on warm caches. Q8 involves many triple patterns and spatial func-tions which result in the production of a large number of intermediate results.As these do not fit in the system’s cache, the response time is unaffected bythe cache contents. System X decides to ignore the spatial index in queries Q3,Q6-Q8 and evaluate any spatial predicate exhaustively over the results of a the-matic join. In queries Q6-Q8, it also uses Cartesian products in the query plan asit considers their evaluation more profitable. These decisions are correct in thecase of Q8 where it outperforms all other systems significantly, but very costlyin the cases of Q3, Q6 and Q7.

5.2 Evaluation using a Synthetic Dataset

Although we had tested Strabon with datasets up to 137 million triples (Sec-tion 5.1), we wanted better control over the size and the characteristics of thespatial dataset being used for evaluation. By using a generator to produce a syn-thetic dataset, we could alter the thematic and spatial selectivities of our queriesand closely monitor the performance of our system, based on both spatial andthematic criteria. Since we could produce a dataset of arbitrary size, we wereable to stress our system by producing datasets of size up to 500 million triples.Storing stRDF Documents. The generator we used to produce our datasetsis a modified version of the generator used by the authors of [4], which was kindlyprovided to us. The data produced follows a general version of the schema ofOpen Street Map depicted in Figure 4. Each node has a spatial extent (thelocation of the node) and is placed uniformly on a grid. By modifying the stepof the grid, we produce datasets of arbitrary size. In addition, each node is

Page 17: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

assigned a number of tags each of which consists of a key-value pair of strings.Every node is tagged with key 1, every second node with key 2, every fourthnode with key 4, etc. up to key 1024. We generated three datasets consistingof 10 million, 100 million and 500 million triples and stored them in Strabonusing the per-predicate scheme. Figure 3b shows the time required to store thesedatasets. In the case of [4], the generator produces directly a binary file usingthe internal format of RDF-3X and computes exhaustively all indices, resultingin higher storage times. On the contrary, Strabon, Parliament and the naiveimplementation store an RDF file for each dataset. Compared to the case of thereal-world dataset, the stRDF loader that provides a bulk mechanism for loadingdata now outperforms the naive implementation, whose loading mechanism usesatomic operations only. We observed this behavior when dealing with any datasetnot including an excessive amount of distinct predicates that are scarcely used,and regardless of whether the underlying DBMS in which the datasets werestored was PostgreSQL or System X. Nevertheless, storage in Strabon PG wasstill the fastest case. In the case of Parliament, similarly to the real-world case,the synthetic dataset had to be converted to GeoSPARQL prior to storing. Theresulting 100 million triples dataset was approximately 27% bigger than theoriginal RDF file and required 6,415 seconds to be stored, thus outperformingall other compared systems but Strabon PG. However, in the case of the 500million triples dataset, Parliament failed to finish the storage operation after 36hours.

Evaluating stSPARQL Queries. In this experiment we used the fol-lowing query template that is identical to the query template used in [4]:SELECT * WHERE {?node geordf:hasTag ?tag. ?node strdf:hasGeography ?geo.

?tag geordf:key PARAM A. FILTER (strdf:inside(?geo, PARAM B))}In this query template, PARAM A is one of the values used when tagging anode and PARAM B is the WKT representation of a polygon. We define thethematic selectivity of an instantiation of the query template as the fraction ofthe total nodes that are tagged with a key equal to PARAM A. For example, byaltering the value of PARAM A from 2 to 4, we reduce the thematic selectivity ofthe query by selecting half the nodes we previously did. We define the spatialselectivity of an instantiation of the query template as the fraction of the totalnodes that are inside the polygon defined by PARAM B. We modify the size ofthe polygon in order to select from 10 up to 106 nodes.

We will now discuss representative experiments from the 100 and 500 mil-lion triples datasets which is a representative subset of the experiments thatwe conducted. For each dataset we present graphs depicting results based onvarious combinations of parameters PARAM A and PARAM B in the case of coldor warm caches. These graphs are presented in Figure 7. In the case of Stra-bon, the stSPARQL query template is mapped to a SQL query, which is sub-sequently executed by PostgreSQL or System X. This query involves the predi-cate tables key(subj,obj), hasTag(subj,obj) and hasGeography(subj,obj)

and the spatial table geo values(id,value,srid). We created two B+ treetwo-column indices for each predicate table and an R-tree index on the value

Page 18: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

osm:Node23

osm:Tag1osm:Tag

osm:Node

osm:key osm:value

osm:hasTag

rdf:type

rdf:type

osm:idgeordf:

hasGeometry1

“1” “1”

“14”“POINT(37.9 23.1)”

Fig. 4: LGD schema Fig. 5: Plan A Fig. 6: Plan B

column of the spatial table. The main point of interest in this SQL query is theorder of join execution in the following sub-query: σobj=T (key) on hastag onσvalue inside S(hasgeography) where T and S are values of the parametersPARAM A and PARAM B respectively. Different orders give significant differencesin query execution time. After consulting the query logs of PostgreSQL, we no-ticed that the vast majority of the SQL queries that were posed and derivedfrom the query template adhered to one of the query plans of Figures 5,6. Ac-cording to plan A, query evaluation starts by evaluating the thematic selectionover table key using the appropriate index. The results are retrieved and joinedwith the hasTag and hasGeography predicate tables using appropriate indices.Finally, the spatial selection is evaluated by scanning the spatial index and theresults are joined with the intermediate results of the previous operations. Onthe contrary, plan B starts with the evaluation of the spatial selection, leavingthe application of the thematic selection for the end.

Unfortunately, in the current version of PostGIS spatial selectivities are notcomputed properly. The functions that estimate the selectivity of a spatial se-lection/join return a constant number regardless of the actual selectivity of theoperator. Thus, only the thematic selectivity affects the choice of a query plan.To state it in terms of our query template, altering the value PARAM A between1 and 1024 was the only factor influencing the selection of a query plan.

A representative example is shown in Figure 7g, where we present the re-sponse times for all values of PARAM A and observe two patterns. When thematicselectivity is high (values 1-2), the response time is low when the spatial selec-tivity (captured by the x-axis) is low, while it increases along with the spatialselectivity. This happens because for these values PostgreSQL chooses plan B,which is good only when the spatial selectivity is low. In cases of lower thematicselectivity (values 4-1024), the response time is initially high but only increasesslightly as spatial selectivity increases. In this case, PostgreSQL chooses plan A,which is good only when the spatial selectivity is high. The absence of dynamicestimation of spatial selectivity is affecting the system performance since it doesnot necessarily begin with a good plan and fails to switch between plans whenthe spatial selectivity passes the turning point observed in the graph.

Similar findings were observed in the case of System X. The decision on whichquery plan to select was influenced by the value of PARAM A. In most cases, a plansimilar to plan B is selected, with the only variation that in the case of the 500million triples dataset the access methods defined in plan B are full table scans

Page 19: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

100

101

102

103

104

102

103

104

105

106

response tim

e [sec]

NaiveRDF3X-*

Strabon PGStrabon X

Parliament

(a) 100 mil.,tag 1 (cold)

100

101

102

103

104

102

103

104

105

106

Naive

RDF3X-*

Strabon PG

Strabon X

Parliament

(b) 100 mil.,tag 1024 (cold)

10-2

10-1

100

101

102

103

104

102

103

104

105

106

Naive

RDF3X-*

Strabon PG

Strabon X

Parliament

(c) 100 mil.,tag 1 (warm)

10-1

100

101

102

103

104

102

103

104

105

106

Naive

RDF3X-*

Strabon PG

Strabon X

Parliament

(d) 100 mil.,tag 1024(warm)

100

101

102

103

104

103

104

105

106

resp

on

se

tim

e [

se

c]

number of Nodes in query region

Strabon PG2

(e) 500 mil.,tag 1 (cold)

100

101

102

103

103

104

105

106

number of Nodes in query region

(f) 500 mil.,tag 1024(cold)

100

101

102

103

103

104

105

106

number of Nodes in query region

key=1key=2key=4

key=16key=32key=64

key=128key=256key=512

key=1024

(g) 500 mil. (cold)

10-2

10-1

100

101

102

103

104

105

106

number of Nodes in query region

(h) 500 mil.,tag 1024(warm)

Fig. 7: Response times

instead of index scans. The only case in which System X selected a differentquery plan was when posing a query with very low thematic selectivity (PARAM A

= 1024) on the 500 million triples dataset, in which case System X switched toa plan similar to plan A.

In Figures 7a-7d we present the results for the 100 million dataset for the ex-treme values 1 and 1024 of PARAM A. Strabon PG outperforms the other systemsin the case of warm caches, while the implementation of [4] and Strabon X out-perform Strabon PG for value 1024 in the case of cold caches (Figure 7b). Inthis case, as previously discussed, PostgreSQL does not select a good plan andthe response times are even higher than the times where PARAM A is equal to 1(Figure 7a), which produces significantly more intermediate results. In Figures7e, 7f and 7h we present the respective graphs for the 500 million dataset andobserve similar behavior as previously, with a small deviation in Figure 7e. Inthis case, we observe that the implementation of [4] outperforms Strabon PG. Inthis scenario, Strabon X also gradually outperforms Strabon PG as spatial selec-tivity increases. However, we observed that PostgreSQL had chosen the correctquery plan for this case. Taking this into account, we tuned PostgreSQL to makebetter use of the system resources. We allowed the usage of more shared buffers

Page 20: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

and the allocation of larger amounts of memory for internal sort operations andhash tables, instead of using secondary storage. As observed in Figure 7e, theresult of these modifications (labeled Strabon PG2) led to significant perfor-mance improvement. This is quite typical with relational DBMS; the more youknow about their internals and the more able you are to tune them for queriesof interest, the better performance you will get.

In general, Strabon X performs well in cases of high spatial selectivity. InFigure 7f, the response time of Strabon X is almost constant. In this case aplan similar to plan A is selected, with the variation that the spatial index isnot utilized for the evaluation of the spatial predicate. This decision would becorrect only for queries with high spatial selectivity. The effect of this plan ismore visible in Figure 7h, where the large number of intermediate results preventthe system to utilize its caches, resulting in constant, yet high response times.

Regarding Parliament, although the results returned by each query posedwere correct, the fact that the response time was very high did not allow usto execute queries using all instantiations of the query template. We instanti-ated the query template using the extreme values of PARAM A and PARAM B andexecuted them to have an indication of the system’s performance.

In all datasets used, we observe that the naive implementation has constantperformance regardless of the spatial selectivity of a query since the spatialoperator is evaluated against all bindings retrieved thus far. The baseline imple-mentation also outperforms Strabon X and the implementation of [4] in querieswith low thematic selectivity (value 1024) over warm caches. As with the linkeddata workload, the naive implementation performs well when posing queries withvery low thematic selectivity over warm caches.

In summary, Strabon PG outperforms the other implementations whencaches are warmed up. Results over cold caches are mixed, but we showed thataggressive tuning of PostgreSQL can increase the performance of Strabon PGresulting to response times slightly better than the implementation of [4]. Nev-ertheless, modifications to the optimizer of PostgreSQL are needed in order toestimate accurately the spatial selectivity of a query to produce better plans bycombining this estimation with the thematic selectivity of the query. Given thesignificantly better performance of RDF-3X over Sesame for standard SPARQLqueries, another open question in our work is how to go beyond [4] and mod-ify the optimizer of RDF-3X so that it can deal with the full stSPARQL querylanguage.

6 Related Work

Geospatial extensions of RDF and SPARQL have been presented recentlyin [11, 13, 20]. An important addition to this line of research is the recentOGC candidate standard GeoSPARQL discussed in [1] and the new ver-sion of stRDF/stSPARQL presented in this paper. Like stRDF/stSPARQL,GeoSPARQL aims at providing a basic framework for the representation andquerying of geospatial data on the Semantic Web. The two approaches have been

Page 21: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

developed independently at around the same time, and have concluded with verysimilar representational and querying constructs. Both approaches represent ge-ometries as literals of an appropriate datatype. These literals may be encodedin various formats like GML, WKT etc. Both approaches map spatial predicatesand functions that support spatial analysis to SPARQL extension functions.GeoSPARQL goes beyond stSPARQL in that it allows binary topological re-lations to be used as RDF properties anticipating their possible utilization byspatial reasoners (this is the topological extension and the related query rewriteextension of GeoSPARQL). In our group, such geospatial reasoning function-ality is being studied in the more general context of “incomplete informationin RDF” of which a preliminary overview is given in [14]. Since stSPARQL hasbeen defined as an extension of SPARQL 1.1, it goes beyond GeoSPARQL by of-fering geospatial aggregate functions and update statements that have not beenconsidered at all by GeoSPARQL. Another difference between the two frame-works is that GeoSPARQL imposes an RDFS ontology for the representationof features and geometries. On the contrary, stRDF only asks that a specificliteral datatype is used and leaves the responsibility of developing any ontologyto the users. In the future, we expect user communities to develop more special-ized ontologies that extend the basic ontologies of GeoSPARQL with relevantgeospatial concepts from their own application domain. In summary, strictlyspeaking, stSPARQL and GeoSPARQL are incomparable in terms of represen-tational power. If we omit aggregate functions and updates from stSPARQL, itsfeatures are a subset of the features offered by the GeoSPARQL core, geometryextension and geometry topology extension components. Thus, it was easy tooffer full support in Strabon for these three parts of GeoSPARQL.

Recent work has also considered implementation issues for geospatial exten-sions of RDF and SPARQL. [4] presents an implementation based on RDF-3X,which we already discussed in detail. Virtuoso provides support for the represen-tation and querying of two-dimensional point geometries expressed in multiplereference systems. Virtuoso models geometries by typed literals like stSPARQLand GeoSPARQL. Vocabulary is offered for a subset of the SQL/MM standardto perform geospatial queries using SPARQL. The open-source edition does notincorporate these geospatial extensions. [3] introduces GeoSPARQL and the im-plementation of a part of it in the RDF store Parliament. [3] discusses interestingideas regarding query processing in Parliament but does not give implementationdetails or a performance evaluation. A more detailed discussion of current pro-posals for adding geospatial features to RDF stores is given in our recent survey[12]. None of these proposals goes beyond the expressive power of stSPARQL orGeoSPARQL discussed in detail above.

7 Conclusions and Future Work

We presented the new version of the data model stRDF and the query languagestSPARQL, the system Strabon which implements the complete functionalityof stSPARQL (and, therefore, a big subset of GeoSPARQL) and an experimen-

Page 22: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

(a)

SELECT l.value, g1.value, g2.value

FROM hasgeometry h1

INNER JOIN name n ON (n.subj=h1.subj)

INNER JOIN type t1 ON (t1.subj=h1.subj)

CROSS JOIN hasgeometry h2

INNER JOIN type t2 ON (t2.subj=h2.subj)

LEFT JOIN label values l ON (l.id=n.obj)

WHERE t1.obj=‘3’ AND t2.obj=‘8’

(b)

SELECT l.value

FROM hasgeometry h1

INNER JOIN name n ON (n.subj=h1.subj)

INNER JOIN type t1 ON (t1.subj=h1.subj)

INNER JOIN geo values g1 ON (g1.id=h1.obj)

CROSS JOIN hasgeometry h2

INNER JOIN type t2 ON (t2.subj=h2.subj)

INNER JOIN geo values g1 ON (g2.id=h2.obj)

LEFT JOIN label values l ON (l.id=n.obj)

WHERE t1.obj=‘3’ AND t2.obj=‘8’

AND (ST Intersects(g1.value, g2.value))

(c)

SELECT l.value

FROM hasgeometry h1

INNER JOIN name n ON (n.subj=h1.subj)

INNER JOIN type t1 ON (t1.subj=h1.subj)

INNER JOIN geo values g1 ON (g1.id=h1.obj)

INNER JOIN geo values g2 ON

(ST Intersects(g1.value,g2.value))

INNER JOIN hasgeometry h2 ON (h2.obj=g2.id)

INNER JOIN type t2 ON (t2.subj=h2.subj)

LEFT JOIN label values l ON (l.id=n.obj)

WHERE t1.obj=‘3’ AND t2.obj=‘8’

(d)

Fig. 8: (a) Query graph of Example 2 and SQL query produced (b) by the naiveimplementation, (c) by incorporating spatial functions and (d) by incorporatingspatial joins.

tal evaluation of Strabon. Future work concentrates on experiments with evenlarger datasets, comparison of Strabon with other systems such as Virtuoso, andstSPARQL query processing in the column store MonetDB.

Acknowledgments

We thank the authors of [4] for providing us with a copy of the generator weused to produce the sythetic datasets. We thank the reviewers for their verydetailed comments and suggestions that improved the paper. This work wassupported in part by the European Commission project TELEIOS (http://www.earthobservatory.eu/).

References

1. Open Geospatial Consortium. OGC GeoSPARQL - A geographic query languagefor RDF data. OGC Candidate Implementation Standard (02 2012)

2. Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: Sw-store: a vertically parti-tioned dbms for semantic web data management. VLDB J. 18(2), 385–406 (2009)

3. Battle, R., Kolas, D.: Enabling the Geospatial Semantic Web with Parliament andGeoSPARQL. In: Semantic Web Interoperability, Usability, Applicability (2011)

Page 23: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

4. Brodt, A., Nicklas, D., Mitschang, B.: Deep integration of spatial query processinginto native RDF triple stores. In: Proceedings of the 18th SIGSPATIAL Inter-national Conference on Advances in Geographic Information Systems. pp. 33–42(2010)

5. Cohn, A., Bennett, B., Gooday, J., Gotts, N.: Qualitative Spatial Representationand Reasoning with the Region Connection Calculus. Geoinformatica 1(3), 275–316(1997)

6. Egenhofer, M.J.: A Formal Definition of Binary Topological Relationships. In:FODO. LNCS, vol. 367, pp. 457–472. Springer (1989)

7. Gutierrez, C., Hurtado, C., Vaisman, A.: Introducing Time into RDF. IEEE Trans.Knowl. Data Eng. 19(2), 207–218 (2007)

8. Hartig, O., Heese, R.: The SPARQL Query Graph Model for Query Optimization.In: ESWC. pp. 564–578 (2007)

9. Hellerstein, J., Naughton, J., Pfeffer, A.: Generalized Search Trees for DatabaseSystems. In: VLDB. pp. 562–573 (1995)

10. Karpathiotakis, M., Kyzirakos, K., Kaoudi, Z., Koubarakis, M., Rodriguez, J.,Aparicio, J.: Evaluation of the registry services. Del. 3.4, SemSorGrid4Env (2011),http://www.semsorgrid4env.eu/files/deliverables/wp3/D3.4.pdf

11. Kolas, D., Self, T.: Spatially Augmented Knowledgebase. In: ISWC/ASWC. pp.792–801 (2007)

12. Koubarakis, M., Karpathiotakis, M., Kyzirakos, K., Nikolaou, C., Sioutis, M.: DataModels and Query Languages for Linked Geospatial Data. Invited papers from 8th

Reasoning Web Summer School, Vienna, Austria, September 03-08, 201213. Koubarakis, M., Kyzirakos, K.: Modeling and Querying Metadata in the Semantic

Sensor Web: The Model stRDF and the Query Language stSPARQL. In: ESWC.pp. 425–439 (2010)

14. Koubarakis, M., Kyzirakos, K., Karpathiotakis, M., Nikolaou, C., Sioutis, M., Vas-sos, S., Michail, D., Herekakis, T., Kontoes, C., Papoutsis, I.: Challenges for Quali-tative Spatial Reasoning in Linked Geospatial Data. In: Proceedings of IJCAI 2011Workshop on Benchmarks and Applications of Spatial Reasoning (2011)

15. Koubarakis, M., Kyzirakos, K., Nikolaou, B., Sioutis, M., Vassos, S.: Adata model and query language for an extension of RDF with time andspace. Del. 2.1, TELEIOS, http://www.earthobservatory.eu/deliverables/

FP7-257662-TELEIOS-D2.1_revised.pdf

16. Koubarakis, M., Sioutis, M., Kyzirakos, K., Karpathiotakis, M., Nikolaou, C., Vas-sos, S., Garbis, G., Bereta, K., Dumitru, O.C., Molina, D.E., Molch, K., Schwarz,G., Datcu, M.: Building Virtual Earth Observatories using Ontologies, LinkedGeospatial Data and Knowledge Discovery Algorithms. In: ODBASE (2012)

17. Kuper, G., Ramaswamy, S., Shim, K., Su, J.: A Constraint-based Spatial Extensionto SQL. In: ACM-GIS. pp. 112–117 (1998)

18. Kyzirakos, K., Karpathiotakis, M., Garbis, G., Nikolaou, C., Bereta, K., Sioutis,M., Papoutsis, I., Herekakis, T., Michail, D., Koubarakis, M., Kontoes, C.: RealTime Fire Monitoring Using Semantic Web and Linked Data Technologies. In:Submitted to ISWC (2012)

19. Kyzirakos, K., Karpathiotakis, M., Koubarakis, M.: Developing Registries for theSemantic Sensor Web using stRDF and stSPARQL (short paper). In: The 3rdInternational Workshop on Semantic Sensor Networks 2010 (SSN10) in conjunctionwith the 9th International Semantic Web Conference (2010)

20. Perry, M.: A Framework to Support Spatial, Temporal and Thematic Analyticsover Semantic Web Data. Ph.D. thesis, Wright State University (2008)

Page 24: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

21. Picalausa, F., Vansummeren, S.: What are real SPARQL queries like? In: Proceed-ings of the International Workshop on Semantic Web Information Management.pp. 7:1–7:6. SWIM ’11 (2011)

22. Sidirourgos, L., Goncalves, R., Kersten, M.L., Nes, N., Manegold, S.: Column-store support for RDF data management: not all swans are white. PVLDB 1(2),1553–1563 (2008)

Page 25: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

Appendix AstSPARQL Queries used in the experimental evaluation

using Linked Data

/* Q1 */

SELECT ?poi WHERE {

?poi <http://linkedgeodata.org/ontology/hasCity> "Rotterdam" .

}

/* Q2 */

SELECT ?poiName WHERE {

?poi <http://linkedgeodata.org/ontology/hasCity> "Rotterdam" .

?poi a <http://linkedgeodata.org/ontology/Amenity> .

?poi <http://www.w3.org/2000/01/rdf-schema#label> ?poiName .

}

/* Q3 */

SELECT ?attractionLabel WHERE {

?waterCourse <http://teleios.di.uoa.gr/ontologies/noaOntology.owl#hasLandUse>

<http://teleios.di.uoa.gr/ontologies/noaOntology.owl#waterCourses> .

?waterCourse a <http://teleios.di.uoa.gr/ontologies/noaOntology.owl#Area> .

?waterCourse

<http://teleios.di.uoa.gr/ontologies/noaOntology.owl#hasGeometry>

?courseGeo .

?attractionInDanger a <http://linkedgeodata.org/ontology/Hotel> .

?attractionInDanger <http://www.w3.org/2000/01/rdf-schema#label>

?attractionLabel .

?attractionInDanger <http://www.w3.org/2003/01/geo/wgs84_pos#geometry>

?attractionGeo .

FILTER(

<http://strdf.di.uoa.gr/ontology#overlap>(

<http://strdf.di.uoa.gr/ontology#buffer>(?courseGeo, 0.5),

?attractionGeo

)

)

}

/* Q4 */

SELECT ?hospitalName WHERE {

?hospital a <http://linkedgeodata.org/ontology/Hospital> .

?hospital <http://www.w3.org/2000/01/rdf-schema#label> ?hospitalName .

?hospital <http://www.w3.org/2003/01/geo/wgs84_pos#geometry> ?hospitalGeo .

FILTER(

<http://strdf.di.uoa.gr/ontology#contains>(

<http://strdf.di.uoa.gr/ontology#buffer>(

Page 26: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

"POINT(23.72873 37.97205)"^^<http://strdf.di.uoa.gr/ontology#WKT>,

0.0572

),

?hospitalGeo

)

) .

}

/* Q5 */

SELECT ?poiName WHERE {

?poi a <http://linkedgeodata.org/ontology/Amenity> .

?poi <http://www.w3.org/2000/01/rdf-schema#label> ?poiName .

?poi <http://www.w3.org/2003/01/geo/wgs84_pos#geometry> ?poiGeo.

FILTER (

<http://strdf.di.uoa.gr/ontology#inside>(

?poiGeo,

"POLYGON ((

3.9111328125 51.7401123046875, 3.9111328125 52.05322265625,

4.6087646484375 52.05322265625, 4.6087646484375 51.7401123046875,

3.9111328125 51.7401123046875

))"^^<http://strdf.di.uoa.gr/ontology#WKT>

)

)

}

/* Q6 */

SELECT ?capital ?pier WHERE {

?country a <http://dbpedia.org/ontology/Country> .

?country <http://dbpedia.org/property/capital> ?capital_label .

?capital <http://www.w3.org/2000/01/rdf-schema#label> ?capital_label .

?capitalGeoNames <http://www.w3.org/2002/07/owl#sameAs> ?capital .

?capitalGeoNames

<http://teleios.di.uoa.gr/ontologies/noaOntology.owl#hasGeography>

?capitalGeo .

OPTIONAL {

?pier a <http://linkedgeodata.org/ontology/Pier> .

?pier <http://www.w3.org/2003/01/geo/wgs84_pos#geometry> ?pierGeo .

FILTER(

<http://strdf.di.uoa.gr/ontology#overlap>(

<http://strdf.di.uoa.gr/ontology#buffer>(?capitalGeo,3.7), ?pierGeo

)

)

}

}

/* Q7 */

SELECT ?label WHERE {

?poi <http://www.w3.org/2003/01/geo/wgs84_pos#geometry> ?poiGeo .

Page 27: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

?poi <http://www.w3.org/2000/01/rdf-schema#label> ?label .

?poi a <http://linkedgeodata.org/ontology/Amenity> .

<http://www.gadm.org/ontology#LUXEMBOURG>

<http://www.gadm.org/ontology#hasGeometry> ?luxGeo .

FILTER(

<http://strdf.di.uoa.gr/ontology#inside>(?poiGeo, ?luxGeo)

) .

}

/* Q8 */

SELECT ?sensor ?place WHERE {

?place a <http://linkedgeodata.org/ontology/NaturalWood> .

?place <http://www.w3.org/2003/01/geo/wgs84_pos#geometry> ?placeGeo.

?sensorSystem <http://purl.oclc.org/NET/ssnx/ssn#hasSubSystem> ?sensor .

?sensorsDeployment <http://purl.oclc.org/NET/ssnx/ssn#deployedSystem>

?sensorSystem .

?sensorsDeployment <http://purl.oclc.org/NET/ssnx/ssn#deployedOnPlatform>

?sensorPlatform .

?sensorPlatform <http://www.loa-cnr.it/ontologies/DUL.owl#hasLocation>

?spaceRegion .

?spaceRegion <http://dbpedia.org/property/hasGeometry> ?sensorsGeo .

?areaOfInterest

<http://teleios.di.uoa.gr/ontologies/noaOntology.owl#hasGeography>

?areaGeo .

?areaOfInterest <http://www.geonames.org/ontology#name> "London" .

FILTER(

<http://strdf.di.uoa.gr/ontology#inside>(

?sensorsGeo, <http://strdf.di.uoa.gr/ontology#buffer>(?areaGeo,1.6)

)

) .

FILTER(

<http://strdf.di.uoa.gr/ontology#overlap>(

<http://strdf.di.uoa.gr/ontology#buffer>(?sensorsGeo, 1.6), ?placegeo

)

) .

}

Page 28: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

Appendix B

1: QueryRoot

2: Projection

3: ProjectionElemList

4: ProjectionElem ‘‘name’’

5: Join

6: Join

7: Join

8: Join

9: StatementPattern

10: Var (name=t)

11: Var (name=-const-1,

value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type)

12: Var (name=-const-2,

value=http://dbpedia.org/ontology/Town)

13: StatementPattern

14: Var (name=t)

15: Var (name=-const-3,

value=http://www.geonames.org/name)

16: Var (name=name)

17: StatementPattern

18: Var (name=t)

19: Var (name=-const-4,

value=http://strdf.di.uoa.gr/ontology#hasGeometry)

20: Var (name=tGeo)

21: StatementPattern

22: Var (name=ba)

23: Var (name=-const-5,

value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type)

24: Var (name=-const-6,

value=http://teleios.di.uoa.gr/ontologies/noaOntology.owl#BurntArea)

25: StatementPattern

26: Var (name=ba)

27: Var (name=-const-7,

value=http://teleios.di.uoa.gr/ontologies/noaOntology.owl#hasGeometry)

28: Var (name=baGeo)

(a)

Page 29: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

1: QueryRoot

2: Projection

3: ProjectionElemList

4: ProjectionElem ‘‘name’’

5: Filter

6: FunctionCall

(uri=http://strdf.di.uoa.gr/ontology#intersects)

7: Var (name=tGeo)

8: Var (name=baGeo)

9: SelectQuery

10: JoinItem type t1

11: JoinItem name n

12: SqlEq

13: IdColumn n.subj

14: IdColumn t1.subj

15: JoinItem hasgeometry h1

16: SqlEq

17: IdColumn h1.subj

18: IdColumn t1.subj

19: JoinItem type t2

20: SqlEq

21: RefIdColumn t2.obj

(value=http://teleios.di.uoa.gr/ontologies/noaOntology.owl#BurntArea)

22: NumberValue 8

23: JoinItem hasgeometry h2

24: SqlEq

25: IdColumn h2.subj

26: IdColumn t2.subj

27: LEFT JoinItem label values l name

28: SqlEq

29: IdColumn l name.id

30: RefIdColumn n.obj (name=name)

31: LEFT JoinItem uri values u ba

32: SqlEq

33: IdColumn u ba.id

34: RefIdColumn t2.subj (name=ba)

35: SqlEq

36: RefIdColumn t1.obj

(value=http://dbpedia.org/ontology/Town)

37: NumberValue 3

38: SelectProjection

39: RefIdColumn n.obj (name=name)

40: LabelColumn n.obj (name=name)

41: SelectProjection

42: RefIdColumn h1.obj (name=tGeo)

43: LabelColumn h1.obj (name=tGeo)

44: SelectProjection

45: RefIdColumn h2.obj (name=baGeo)

46: LabelColumn h2.obj (name=baGeo)

(b)

Page 30: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

1: QueryRoot

2: SelectQuery

3: JoinItem type t1

4: JoinItem name n

5: SqlEq

6: IdColumn n.subj

7: IdColumn t1.subj

8: JoinItem hasgeometry h1

9: SqlEq

10: IdColumn h1.subj

11: IdColumn t1.subj

12: JoinItem type t2

13: SqlEq

14: RefIdColumn t2.obj

(value=http://teleios.di.uoa.gr/ontologies/noaOntology.owl#BurntArea)

15: NumberValue 8

16: JoinItem hasgeometry h2

17: SqlEq

18: IdColumn h2.subj

19: IdColumn t2.subj

20: JoinItem geo values g1

21: SqlEq

22: IdColumn g1.id

23: RefIdColumn h1.obj (name=tGeo)

24: JoinItem geo values g2

25: SqlEq

26: IdColumn g2.id

27: RefIdColumn h2.obj (name=baGeo)

28: LEFT JoinItem label values l name

29: SqlEq

30: IdColumn l name.id

31: RefIdColumn n.obj (name=name)

32: SqlEq

33: RefIdColumn t1.obj

(value=http://dbpedia.org/ontology/Town)

34: NumberValue 3

35: SqlIntersects

36: LabelColumn h1.obj (name=tGeo)

37: LabelColumn h2.obj (name=baGeo)

38: SelectProjection

39: RefIdColumn n.obj (name=name)

40: LabelColumn n.obj (name=name)

(c)

Page 31: Strabon: A Semantic Geospatial DBMS (Long Version) · 2017. 8. 20. · [1]). Strabon extends the RDF store Sesame, allowing it to manage both thematic and spatial RDF data stored

1: QueryRoot

2: SelectQuery

3: Jo

4: inItem hasgeometry h1

5: JoinItem name n

6: SqlEq

7: IdColumn n.subj

8: IdColumn h1.subj

9: JoinItem type t1

10: SqlEq

11: RefIdColumn t1.obj

(value=http://dbpedia.org/ontology/Town)

12: NumberValue 3

13: SqlEq

14: IdColumn t1.subj

15: IdColumn h1.subj

16: JoinItem geo values g1

17: SqlEq

18: IdColumn g1.id

19: RefIdColumn h1.obj (name=tGeo)

20: JoinItem geo values g2

21: SqlIntersects

22: LabelColumn h1.obj (name=tGeo)

23: LabelColumn h2.obj (name=baGeo)

24: JoinItem hasgeometry h2

25: SqlEq

26: IdColumn h2.obj

27: IdColumn g2.id

28: JoinItem type t2

29: SqlEq

30: RefIdColumn t2.obj

(value=http://teleios.di.uoa.gr/ontologies/noaOntology.owl#BurntArea)

31: NumberValue 8

32: SqlEq

33: IdColumn t2.subj

34: IdColumn h2.subj

35: LEFT JoinItem label values l name

36: SqlEq

37: IdColumn l name.id

38: RefIdColumn n.obj (name=name)

39: SelectProjection

40: RefIdColumn n.obj (name=name)

41: LabelColumn n.obj (name=name)

(d)

Fig. 9: Query tree corresponding to the stSPARQL query of Example 2 that isproduced (a) by the parser of Sesame, (b) by the optimizer of Sesame, (c) byStrabon after incorporating spatial functions and (d) by Strabon after incorpo-rating spatial joins.