database laboratory university of a coruña a coruña, spain
DESCRIPTION
Retrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies. Database Laboratory University of A Coruña A Coruña, Spain. Miguel R. Luaces, Ángeles S. Places, Francisco J. Rodríguez, Diego Seco. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Retrieving Documents with GeographicReferences Using a Spatial Index
Structure Based on Ontologies
Database LaboratoryDatabase LaboratoryUniversity of A CoruñaUniversity of A Coruña
A Coruña, SpainA Coruña, Spain
Miguel R. Luaces, Ángeles S. Places, Francisco J. Rodríguez, Diego Seco
20th October, 2008 Barcelona - SeCoGIS 2008 2/29
Motivation The Database Laboratory at the University of A Coruña
works in two very active research fields:— Geographic Information Systems (GIS)
EIEL Project (http://www.dicoruna.es/webeiel)
— Information Retrieval (IR) Galician Virtual Library (http://bvg.udc.es)
GIS IR GIRRetrieval of geographically and thematically relevant documents
in response to a query of the form <theme, location>
20th October, 2008 Barcelona - SeCoGIS 2008 3/29
Motivation Many of the documents stored in digital libraries and
document databases include geographic references— Example: “…the hurricane touched land at Veracruz…”
Few index structures or retrieval algorithms take into account these geographic references
Some proposals have appeared recently. But… There are some specific particularities of geographic
space that are not taken into account by these proposals:— Hierarchical nature of geographic space— Topological relationships between the geographic objects
20th October, 2008 Barcelona - SeCoGIS 2008 4/29
Contents
Motivation
Related Work
Architecture
Index Structure
Supported Query Types
Experiments
Conclusions and Future Work
20th October, 2008 Barcelona - SeCoGIS 2008 5/29
Related Work Many index structures and techniques have been proposed in each
area:— IR: Inverted Index
Geographic references are normal words— GIS: R-Tree
These structures do not take into consideration the hierarchy of space
The combination of both types of indexes (SPIRIT project):— Text-First— Geo-First
They do not take into account the relationships between the geographic objects that they are indexing
An Ontology can describe the specific characteristics of geographic space:— Ontologies are used in query expansion, relevance rankings,
and web resource annotation Nobody has ever tried to combine it with other types of indexes to
have a hybrid structure
20th October, 2008 Barcelona - SeCoGIS 2008 6/29
Contents
Motivation
Related Work
Architecture
Index Structure
Supported Query Types
Experiments
Conclusions and Future Work
20th October, 2008 Barcelona - SeCoGIS 2008 7/29
Architecture
Administration User Interface Query User Interface
Geographic Information Retrieval Module
Geometry Supplier Service
Gazetteer Service
WMSQuery
Evaluation Service
Document Abstraction
Index construction Index Structure
Text, place name, text
Geographic Database
Document Database
Geographic Space Ontology Service
Continent
Country
Administrative
Region
Populated
Place
contains
contains
contains
Ontology
Europe
Spain
Galicia
Coruña
Ontology
Instance
20th October, 2008 Barcelona - SeCoGIS 2008 8/29
Architecture Document storage workflow:
— Keyword extraction Classic IR techniques (removing stopwords, stemmers)
— Build the index structure Natural Language Processing Techniques Gazetteer Service Geographic Space Ontology Service
Processing services:— Query solving service— Web Map Service (WMS)— Geographic information retrieval module
User Interface:— Administration (manage the document collection)— Query
20th October, 2008 Barcelona - SeCoGIS 2008 9/29
Contents
Motivation
Related Work
Architecture
Index Structure
Supported Query Types
Experiments
Conclusions and Future Work
20th October, 2008 Barcelona - SeCoGIS 2008 10/29
Index Structure
… …
hotel 1,3,7,8,12,…
sea 3,5,6,9,10,…
… …
Inverted Index
… …
Germany
Spain
… …
Place Name Hash Table
DocIds 2,3,…
EuropeAsia …
SpainGermany
MadridGalicia
…
…
Coruña
20th October, 2008 Barcelona - SeCoGIS 2008 11/29
Index Structure Based on a spatial ontology
— It models both the vocabulary and the spatial structure of places for purposes of information retrieval
Tree composed by nodes that represent place names— Each node:
Keyword (a place name) Bounding box Document identifiers Children nodes
— These nodes are connected by means of inclusion relationships— If the list of children nodes exceeds a threshold, an R-Tree is
used Auxiliary structures:
— Place name hash table— Traditional Inverted Index
20th October, 2008 Barcelona - SeCoGIS 2008 12/29
Index Structure Advantages:
— All textual queries and spatial queries can be efficiently processed
— Queries combining textual and spatial aspects are supported
— Updates and optimizations in each index are handled independently
Drawbacks:— The tree that supports the structure is possibly unbalanced— The structure is static
20th October, 2008 Barcelona - SeCoGIS 2008 13/29
Contents
Motivation
Related Work
Architecture
Index Structure
Supported Query Types
Experiments
Conclusions and Future Work
20th October, 2008 Barcelona - SeCoGIS 2008 14/29
Supported Query Types Pure textual queries
— “Retrieve all documents where the words hotel and sea appear”
— How do we solve it? Inverted Index
Pure spatial queries— “Retrieve all documents that refer to the following
geographic area”— How do we solve it?
Descend the structure + refine the result The same algorithm that is used with spatial indexes
20th October, 2008 Barcelona - SeCoGIS 2008 15/29
Supported Query Types Textual queries over a geographic area
— “Retrieve all documents with the word hotel that refer to the following area”
— How do we solve it? Example
— Geographic references can be given using place names
20th October, 2008 Barcelona - SeCoGIS 2008 16/29
Supported Query Types
… …
hotel 1,3,7,8,12,…
sea 3,5,6,9,10,…
… …
Inverted Index
EuropeAsia …
SpainPortugal
MadridGalicia
…
…
Index Structure
Text Result
Spatial Result
Query Result
… …
hotel 1,3,7,8,12,…
sea 3,5,6,9,10,…
… …
Text Result 1,3,7,8,12,…
Spatial Result
Query Result
Text Result 1,3,7,8,12,…
Spatial Result 12,14,…
Query Result
Query Window
Europe
Portugal Spain
Galicia
Coruña …Coruña
DocIds 12,14,…
Text Result 1,3,7,8,12,…
Spatial Result 12,14,…
Query Result 12,…
Text Result 1,3,7,8,12,…
Spatial Result 12,14,…
Query Result 12,…
20th October, 2008 Barcelona - SeCoGIS 2008 17/29
Supported Query Types Textual queries with place names
— “Retrieve all documents with the word hotel that refer to Spain”
— How do we solve it? Example
— We save some time by avoiding a tree traversal
20th October, 2008 Barcelona - SeCoGIS 2008 18/29
Supported Query Types
… …
hotel 1,3,7,8,12,…
sea 3,5,6,9,10,…
… …
Inverted Index
EuropeAsia …
SpainGermany
MadridGalicia
…
…
Index Structure
… …
Spain
Germany
… …
Place Name Hash Table
Text Result
Spatial Result
Query Result
DocIds 2,3,…DocIds 5,7,…DocIds 12,14,…
… …
hotel 1,3,7,8,12,…
sea 3,5,6,9,10,…
… …
Text Result 1,3,7,8,12,…
Spatial Result 2,3,5,7,12,14,…
Query Result 3,7,12,…
Text Result 1,3,7,8,12,…
Spatial Result 2,3,5,7,12,14,…
Query Result
Text Result 1,3,7,8,12,…
Spatial Result 2,3,5,7,…
Query Result
Text Result 1,3,7,8,12,…
Spatial Result
Query Result
Text Result 1,3,7,8,12,…
Spatial Result 2,3,…
Query Result
… …
Spain
Germany
… …
Spain
Galicia Madrid
Text Result 1,3,7,8,12,…
Spatial Result 2,3,5,7,12,14,…
Query Result 3,7,12,…
20th October, 2008 Barcelona - SeCoGIS 2008 19/29
Supported Query Types Another improvement: QUERY EXPANSION
— “retrieve all documents that refer to Spain”— How do we solve it?
The Query Evaluation Service discovers that Spain is a geographic reference
The Place Name Hash Table locates quickly the internal node that represents the geographic object Spain
All the documents associated to this node are part of the result
All the documents associated to the subtree are part of the result
— The result contains not only those documents that include the term Spain, but also all documents that contain the name of a geographic object included in Spain
20th October, 2008 Barcelona - SeCoGIS 2008 20/29
Demo
20th October, 2008 Barcelona - SeCoGIS 2008 21/29
Contents
Motivation
Related Work
Architecture
Index Structure
Supported Query Types
Experiments
Conclusions and Future Work
20th October, 2008 Barcelona - SeCoGIS 2008 22/29
Experiments Document collections
FT-91 FT-94# documents 5,368 71,489
# textual terms 64,711 251,057
# place names candidates 23,550 299,661
# documents with candidates 4,652 60,823
# documents with place names 4,182 54,899
# place names 167,774 2,274,146
# spatial nodes 21,282 64,843
20th October, 2008 Barcelona - SeCoGIS 2008 23/29
Experiments Randomly generated pure spatial queries
Ontology-based index versus R-Tree [FT-91]
Query area 0.001% 0.01% 0.1% 1%
Ontology-based index 0.013 0.017 0.052 0.360
R-Tree 0.010 0.016 0.057 0.370
20th October, 2008 Barcelona - SeCoGIS 2008 24/29
Experiments Ontology-based index versus R-Tree (zones of high
document density) [FT-91]
Query area 0.001% 0.01% 0.1% 1%
Ontology-based index 0.03 0.11 1.05 9.84
R-Tree 0.07 0.22 1.64 12.85
Ontology-based index versus R-Tree (zones of low document density) [FT-91]
Query area 0.001% 0.01% 0.1% 1%
Ontology-based index 0.02 0.03 0.09 0.4
R-Tree 0.02 0.03 0.07 0.2
20th October, 2008 Barcelona - SeCoGIS 2008 25/29
Experiments Improved R-Tree
— List of documents for each location
Query area 0.001% 0.01% 0.1% 1%
Ontology-based index 0.013 0.017 0.052 0.360
Improved R-Tree 0.007 0.008 0.022 0.145
Ontology-based index versus Improved R-Tree [FT-91]
Ontology-based index versus Improved R-Tree [FT-94]
Query area 0.001% 0.01% 0.1% 1%
Ontology-based index 0.016 0.036 0.223 3.017
Improved R-Tree 0.007 0.017 0.097 1.056
20th October, 2008 Barcelona - SeCoGIS 2008 26/29
Contents
Introduction
Motivation
Related Work
Architecture
Index Structure
Supported Query Types
Experiments
Conclusions and Future Work
20th October, 2008 Barcelona - SeCoGIS 2008 27/29
Conclusions We have defined an architecture for Geographic
Information Retrieval systems
It takes into account both textual and geographic references in the documents
This is achieved by a new index structure that combines an inverted index, a spatial index and an ontology
Traditional queries and new types of queries can be solved
20th October, 2008 Barcelona - SeCoGIS 2008 28/29
Future Work Evaluate the performance of the index structure
Define algorithms to rank the retrieved documents
Improve the disambiguation process of place names
Explore the use of different ontologies
Include other types of spatial relationships (e.g., adjacency)
Retrieving Documents with GeographicReferences Using a Spatial Index
Structure Based on Ontologies
Database LaboratoryDatabase LaboratoryUniversity of A CoruñaUniversity of A Coruña
A Coruña, SpainA Coruña, Spain
Miguel R. Luaces, Ángeles S. Places, Francisco J. Rodríguez, Diego Seco
Contact: [email protected]