g. fekete, jhu efficient search indices for geospatial data in a relational database gyorgy (george)...
TRANSCRIPT
![Page 1: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/1.jpg)
G. Fekete, JHU
Efficient search indices for geospatial data in a relational database
Gyorgy (George) FeketeDept. Physics and Astronomy
Johns Hopkins University
![Page 2: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/2.jpg)
G. Fekete, JHU
Acknowledgements
• Alex Szalay– NVO, SDSS, iVDGL, ...
• Jim Gray– Databases, SQL Server
• Ani Thakar, Tamas Budavari– SDSS pipeline, Geometric libraries
![Page 3: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/3.jpg)
G. Fekete, JHU
Motivation
• Growth of volume of data– terabytes per day
• Increasing importance of databases in managing science data
• Data mining : potential for new discoveries• Cross matching between multiple surveys• Much of this data is distributed on a sphere
– astronomy and earth science– great interest in a universal, computer-friendly index
on the sphere
![Page 4: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/4.jpg)
G. Fekete, JHU
Astronomy Data
• “old days” – astronomers took photos.
• Since the 1960’s– they began to digitize.
New instruments are digital (100s of GB/nite)Detectors are following Moore’s law.Data avalanche: double every 2 years
![Page 5: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/5.jpg)
G. Fekete, JHU
Astronomy Data
• Astronomers have a few Petabytes now.
• Data volume and ownership– doubles every 2 years.– Data is public after 2 years.– So, 50% of the data is public.– Some have private access to 5% more data.
• But…..– How do I get at that 50% of the data?
![Page 6: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/6.jpg)
G. Fekete, JHU
New Astronomy
• Data “Avalanche”– the flood of Terabytes of data– present techniques of handling these data do not
scale well with data volume
• Systematic data exploration– will have a central role– statistical analysis of the “typical” objects– automated search for the “rare” events
• Digital archives of the sky– will be the main access to data
![Page 7: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/7.jpg)
G. Fekete, JHU
Data Intensive Science
• Data avalanche in astronomy and other sciences– old file-based solutions do not cut it– old data silos don’t work– old programming models don’t work
• We have some new tricks!• Astronomy and Earth-Science
– methods presented here deal with the topology and the geometry of the sphere
![Page 8: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/8.jpg)
G. Fekete, JHU
One Of These Tricks:
• Map regions of the sphere to unique identifiers that can be used as references to those areas– elementary spherical geometry to identify a
region– multi-resolution– compactly describe areas at arbitrary
granularity
![Page 9: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/9.jpg)
G. Fekete, JHU
Support Spatial Searches
Typical queries– What is near this point?– What objects are in this area?– What areas overlap this area?
![Page 10: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/10.jpg)
G. Fekete, JHU
Design Considerations
• Has to – work with a relational database– represent areas of interest precisely– be scalable– be coordinate system neutral– maintain consistency with the topology of the sphere
• Approach:– precise mathematical description of regions– methods for covering a region with an optimal set of
discrete descriptors (trixels)– covermap of trixels used for accelarated query
![Page 11: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/11.jpg)
G. Fekete, JHU
Components
• Region descriptions (continuous part)– region, convex, halfspace– API and a text language to describe– XML for inter-service, inter-application object transfer
• Hierachical Triangular Mesh (discrete part)– trixels– covermaps
• Database– extend the DB server engine with spatial access
methods– implementing coarse filtering with table valued
functions
![Page 12: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/12.jpg)
G. Fekete, JHU
Continuous Part: A Region
Region– is the union of convexes
Convex– is intersection of halfspaces
Halfspace– simple search cone– circle
![Page 13: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/13.jpg)
G. Fekete, JHU
Examples of Convexes
• Disk, Circle, Search cone, ...
• Spherical Polygon– yes, it is actually a convex (adj.) convex (n.)
• Band
• Lat/Lon (or Ra/Dec) rectangle
• anything else...
![Page 14: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/14.jpg)
G. Fekete, JHU
Halfspace
Cutting plane makes two halfspaces
Oriented plane makes one well defined halfspace
![Page 15: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/15.jpg)
G. Fekete, JHU
Halfspace
Completely defined by (directed) plane normal and distance along the normal
D = cos (cone halfangle)
h = (x, y, z, D)
![Page 16: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/16.jpg)
G. Fekete, JHU
Point Inclusion In Region(x,y,z)
P
Q
P . (x, y, z) > D
h = (x, y, z, D)
Q . (x, y, z) < D
Point is inside a convex if and ony ifit is inside all halfspaces
Point is inside a region if and ony ifit is inside at least one convex
![Page 19: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/19.jpg)
G. Fekete, JHU
Disconnected Components
• Intersecting halfspaces can produce multiple connected components
• Anything you can think of can be expressed as a union of convexes
![Page 21: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/21.jpg)
G. Fekete, JHU
Triangle Subdivision Scheme
Each trixel can be named:eg S123222102
HTMId: depth limited trixels are represented 64-bit integers
![Page 22: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/22.jpg)
G. Fekete, JHU
HTMId Coherence
1023 4092 - 4095 16368 - 16383
level 3 level 4 level 5
17575006175232 - 17592186044415level 20
![Page 23: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/23.jpg)
G. Fekete, JHU
Covermap Of Circle
covermap
is a set of trixels that cover a region
![Page 24: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/24.jpg)
G. Fekete, JHU
Covermap Of California
15277198671872 - 1527827241369515298673508352 - 1530082099199915301089427456 - 15302968475647... ...15384572854272 - 15384841289727 44 trixels, but only 13 ranges
Use covermaps and HtmIDs to coarse filter...
![Page 25: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/25.jpg)
G. Fekete, JHU
Database Part
1. From table of objects, consider only those whose key values are in the covermap
2. Of those that passed, perform calculation to complete query
3. Return result in table
![Page 26: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/26.jpg)
G. Fekete, JHU
Coarse and Fine Filtering In Queries
Coarse Subset
All Objects
Reje
ct
AcceptFineFilter
Coarse FilterCoarse Subset
All Objects
Reje
ctR
eject
AcceptAcceptFineFilterFineFilter
Coarse FilterCoarse Filter
use covermaps
use precise calculations
![Page 27: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/27.jpg)
G. Fekete, JHU
Usage of Tables and Index Keys
Create a function that generates keys that cluster related data together
– if objects A and B are nearby, then the keys for A and B should be also be nearby in the Index space
– HtmID
Create a table-valued function that returns– list of key ranges (the covermap) containing all the
pertinent values– covermap
![Page 28: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/28.jpg)
G. Fekete, JHU
Caveats
• You cannot always get every key to be near all its neighbors– keys are sorted in one dimension– relatives are near in two-dimensional space
• But we can come close– The ratio of false-positives to correct answers
is a measure of how well you are doing.
.
![Page 30: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/30.jpg)
G. Fekete, JHU
Sample Covermap
select * from fHtmCoverCircleLatLon(39.3, -76.6, 100)
HtmIDStart HtmIDEnd---------------- ----------------14023336656896 1402414196326314024410398720 1402521570508714025484140544 14027363188735
![Page 31: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/31.jpg)
G. Fekete, JHU
Places Within 100 Miles Of Baltimore
select ObjIDfrom SpatialIndex join fHtmCoverCircleLatLon(39.3, -76.6, 100) On HtmID between HtmIDStart and HtmIDEndwhere Type = 'P' and dbo.fDistanceLatLon(39.3, -76.6, Lat, Lon) < 100go
Number of rows in cover join (coarse filter) 2223Number of rows that are within 100 n. miles (after the fine filter). 1122 Number of places in DB 22993Time with covermap 35Time without covermap 100
![Page 32: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/32.jpg)
G. Fekete, JHU
California As A Regiondeclare @californiaRegion varchar(max)set @californiaRegion = 'REGION ' + 'rect latlon 39 -125' -- nortwest corner + '42 -120 ' -- center of Lake Tahoe + 'chull latlon 39 -124 ' -- Pt. Arena + '39 -120 ' -- Lake tahoe. + '35 -114.6 ' -- start Colorado River + '34.3 -114.1 ' -- Lake Havasu + '32.74 -114.5 ' -- Yuma + '32.53 -117.1 ' -- San Diego + '33.2 -119.5 ' -- San Nicholas Is + '34 -120.5 ' -- San Miguel Is. + '34.57 -120.65 ' -- Pt. Arguelo + '36.3 -121.9 ' -- Pt. Sur + '36.6 -122.0 ' -- Monterey + '38 -123.03 ' -- Pt. Rayes
![Page 33: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/33.jpg)
G. Fekete, JHU
California Cities
select PlaceName from Place where HtmID in (select distinct SI.objID from fHtmCoverRegion(@californiaRegion) loop join SpatialIndex SI on SI.HtmID between HtmIdStart and HtmIdEnd and SI.type = 'P' join place P on SI.objID = P.HtmID cross join fHtmRegionToTable(@californiaRegion) Poly group by SI.objID, Poly.convexID having
min(SI.x*Poly.x + SI.y*Poly.y + SI.z*Poly.z - Poly.d) >= 0)
OPTION( FORCE ORDER)
This is a popular query, so we can include it as a stored procedure
See Point Inclusion
![Page 34: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/34.jpg)
G. Fekete, JHU
Point Inclusion With SQL(x,y,z)
P
P . (x, y, z) > D
h = (x, y, z, D)
P . (x, y, z) - D > 0
min(SI.x*Poly.x + SI.y*Poly.y + SI.z*Poly.z - Poly.d) >= 0)
![Page 35: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/35.jpg)
G. Fekete, JHU
Covermap Of California
15277198671872 - 1527827241369515298673508352 - 1530082099199915301089427456 - 15302968475647... ...15384572854272 - 15384841289727 44 trixels, but only 13 ranges
Use covermaps and HtmIDs to coarse filter...
![Page 36: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/36.jpg)
G. Fekete, JHU
DB Function For Region Search
select PlaceName from Placewhere HtmID in(select ObjID
from fHtmRegionObjects(@californiaRegion,'P'))
Number of rows in cover join (coarse filter) 981Number of rows that are within region 885 Number of places in DB 22993Time with covermap 110Time without covermap 1210
![Page 37: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/37.jpg)
G. Fekete, JHU
SDSS
• Digital map in 5 spectral bands covering ¼ of the sky.
• Will obtain 40 TB of raw pixel data.• Photometric catalog with more than 200 million
objects.• Spectra of ~ 1 million objects.• Data Release 3 – DR3: 150 M images, 480 k
spectra.
![Page 38: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/38.jpg)
G. Fekete, JHU
Ambitious Survey
• Info content > US Library of Congress
• Before SDSS:total number of galaxies with measured parameters ~ 100k
•After SDSS, we will have detailed parameters forover 100 Million galaxies!!
![Page 39: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/39.jpg)
G. Fekete, JHU
SDSS Processing Pipeline
• Processed data ingested into a relational DBMS• Allows fast exploration and analysis - Data Mining• Heavily indexed to speed up access – HTM + DB Indices• Short queries can run interactively.• Long queries (> 1 hr) require a custom Batch Query
System.
![Page 40: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/40.jpg)
G. Fekete, JHU
SDSS Data Access
• Data Archive Server (DAS)– FITS files (raw data)– Images, spectra, corrected frames, atlas images,
binned images, masks– Online form-based access at www.sdss.org– Rsync and wget file retrieval
• Catalog Archive Server (CAS)– Science parameters extracted to catalogs– Stuffed into relational DBMS (SQL Server)– Online access via SkyServer at http://cas.sdss.org/,
http://skyserver.sdss.org
![Page 41: G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f035503460f94c172a8/html5/thumbnails/41.jpg)
G. Fekete, JHU
Conclusion
• HTM methods provide means for implementing ways to filter data so that expensive geometrical computations to satisfy a query are performed on only a small subset of the original data
• The HTM is on its way to become one of the de facto standards for representing spatial information in astronomical catalogs, especially for large-scale surveys.