1 introduction to spatial databases donghui zhang ccis northeastern university

Click here to load reader

Post on 02-Jan-2016

214 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Introduction to Spatial DatabasesDonghui Zhang

    CCISNortheastern University

  • What is spatial database?A database system that is optimized to store and query spatial objects: Point: a hotel, a carLine: a road segmentPolygon: landmarks, layout of VLSIVLSI LayoutRoad NetworkSatellite Image

  • Are spatial databases useful?Geographical Information Systemse.g. data: road network and places of interest.e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systemse.g. data: land cover, climate, rainfall, and forest fire.e.g. usage: find total rainfall precipitation.Corporate Decision-Support Systems e.g. data: store locations and customer locations.e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systemse.g. data: locations of soldiers (w/wo medical equipments).e.g. usage: monitor soldiers that may need help from each one with medical equipment.

  • MapQuest.comShortest-Path Query Fastest-Path Query

  • Driving directions as you go. Find nearest Wal-Mart or hospital. NN Query

  • ArcGIS 9.2, ESRIRange query

  • Are spatial databases useful?Geographical Information Systemse.g. data: road network and places of interest.e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systemse.g. data: land cover, climate, rainfall, and forest fire.e.g. usage: find total rainfall precipitation.Corporate Decision-Support Systems e.g. data: store locations and customer locations.e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systemse.g. data: locations of soldiers (w/wo medical equipments).e.g. usage: monitor soldiers that may need help from each one with medical equipment.

  • Aggregation query

  • Are spatial databases useful?Geographical Information Systemse.g. data: road network and places of interest.e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systemse.g. data: land cover, climate, rainfall, and forest fire.e.g. usage: find total rainfall precipitation.Corporate Decision-Support Systems e.g. data: store locations and customer locations.e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systemse.g. data: locations of soldiers (w/wo medical equipments).e.g. usage: monitor soldiers that may need help from each one with medical equipment.

  • Optimal Location query

  • Are spatial databases useful?Geographical Information Systemse.g. data: road network and places of interest.e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systemse.g. data: land cover, climate, rainfall, and forest fire.e.g. usage: find total rainfall precipitation.Corporate Decision-Support Systems e.g. data: store locations and customer locations.e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systemse.g. data: locations of soldiers (w/wo medical equipments).e.g. usage: monitor soldiers that may need help from each one with medical equipment.

  • BobJohnGeorgeBillMike

  • BobJohnGeorgeBillMikeRNN queryWho will seek help from me?

  • And beyond the space 2004 NBA dataset*: each player has 17 attributesSpatial Data: an object is a point in a 17-dimensional spaceWho are the best players?i.e. not dominated by any other player.* www.databaseBasketball.comSkyline query

    NamePointsReboundsAssistsStealsTracy McGrady2003484448135Kobe Bryant181939239886Shaquille O'Neal 166976020036Yao Ming14656696134Dwyane Wade1854397520121Steve Nash116524986174

  • And beyond the space 2004 NBA dataset*: each player has 17 attributesSpatial Data: an object is a point in a 17-dimensional spaceWho are the best players?i.e. not dominated by any other player.* www.databaseBasketball.comSkyline query

    NamePointsReboundsAssistsStealsTracy McGrady2003484448135Kobe Bryant181939239886Shaquille O'Neal 166976020036Yao Ming14656696134Dwyane Wade1854397520121Steve Nash116524986174

  • And beyond the space 2004 NBA dataset*: each player has 17 attributesSpatial Data: an object is a point in a 17-dimensional spaceWho are the best players?i.e. not dominated by any other player.* www.databaseBasketball.comSkyline query

    NamePointsReboundsAssistsStealsTracy McGrady2003484448135Kobe Bryant181939239886Shaquille O'Neal 166976020036Yao Ming14656696134Dwyane Wade1854397520121Steve Nash116524986174

  • And beyond the space 2004 NBA dataset*: each player has 17 attributesSpatial Data: an object is a point in a 17-dimensional spaceWho are the best players?i.e. not dominated by any other player.* www.databaseBasketball.comSkyline query

    NamePointsReboundsAssistsStealsTracy McGrady2003484448135Kobe Bryant181939239886Shaquille O'Neal 166976020036Yao Ming14656696134Dwyane Wade1854397520121Steve Nash116524986174

  • And beyond the space 2004 NBA dataset*: each player has 17 attributesSpatial Data: an object is a point in a 17-dimensional spaceWho are the best players?i.e. not dominated by any other player.* www.databaseBasketball.comSkyline query

    NamePointsReboundsAssistsStealsTracy McGrady2003484448135Kobe Bryant181939239886Shaquille O'Neal 166976020036Yao Ming14656696134Dwyane Wade1854397520121Steve Nash116524986174

  • And beyond the space 2004 NBA dataset*: each player has 17 attributesSpatial Data: an object is a point in a 17-dimensional spaceWho are the best players?i.e. not dominated by any other player.* www.databaseBasketball.comSkyline query

    NamePointsReboundsAssistsStealsTracy McGrady2003484448135Kobe Bryant181939239886Shaquille O'Neal 166976020036Yao Ming14656696134Dwyane Wade1854397520121Steve Nash116524986174

  • Research goals in spatial databasesSupport spatial database queries efficiently!range query, aggregation query, NN query, RNN query, optimal-location query, fastest-path query, skyline query, Which statement is the best in a large spatial database?(a) Both an O(n2) algorithm and an O(n) algorithm are efficient.(b) An O(n2) algorithm is not efficient, but an O(n) algorithm is.(c) Neither an O(n2) algorithm nor an O(n) algorithm is efficient.

    Answer: (c)! Even a linear algorithm is not efficient!

  • Research goals in spatial databasesExample of a linear algorithm: to find my nearest Wal-mart, compare my location with all Wal-marts in the world.Example of a quadratic algorithm: to find the skyline of NBA players, compare every player against all other players (to see if it is dominated).Sample scenario:Disk page size: 8KB.Database size: 1GB = 131,072 disk page.Let each disk I/O be 10-3 second.O(n): 131 seconds 2 minutes. (Not efficient!)O(n2): 200 days! (Out of the question!)

  • How can you do better than O(n)?Answer: use (disk-based) index structures!

    However, 1-dim index structures, e.g. the B+-tree, are not efficient.

    E.g. to search for hotels in Boston

  • A 1-dim index is not good enough

  • A 1-dim index is not good enough

  • ContentThe R-treeRange QueryAggregation QueryNN QuerySkyline QueryHighlights of Our Research

  • R-Tree Motivation2046810246810x axisy axisbcadefghijklmRange query: find the objects in a given range.E.g. find all hotels in Boston.

    No index: scan through all objects. NOT EFFICIENT!

  • R-Tree: Clustering by Proximity

  • R-Tree

  • R-Tree

  • Range Query2046810246810x axisy axisbcaE1defghijklmE2abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E7

  • Range Query2046810246810x axisy axisbcaE1defghijklmE2abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E7

  • Aggregation QueryGiven a range, find some aggregate value of objects in this range.COUNT, SUM, AVG, MIN, MAXE.g. find the total number of hotels in Massachusetts.Straightforward approach: reduce to a range query. Better approach: along with each index entry, store aggregate of the sub-tree.

  • Aggregation Query2046810246810x axisy axisbcaE1defghijklmE2abcdeE :81E :52E :33E :24E :35RootE1E2E3E4fghE5lmE7ijkE6E :36E :27

  • Aggregation Query2046810246810x axisy axisbcaE1defghijklmE2abcdeE :81E :52E :33E :24E :35RootE1E2E3E4fghE5lmE7ijkE6E :36E :27Subtree pruned!

  • ContentThe R-treeRange QueryAggregation QueryNN QuerySkyline QueryHighlights of Our Research

  • Nearest Neighbor (NN) QueryGiven a query location q, find the nearest object.

    E.g.: given a hotel, find its nearest bar.qa

  • A Useful Metric: MINDISTMinimum distance between q and an MBR.

    It is an lower bound of d(o, q) for every object o in E1.

    E1qMINDIST(q, E1)

  • NN Basic AlgorithmKeep a heap H of index entries and objects, ordered by MINDIST.Initially, H contains the root.While H Extract the element with minimum MINDISTIf it is an index entry, insert its children into H.If it is an object, return it as NN.End while

  • NN Query ExampleE11E22Visit RootActionHeap2046810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E712595213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013

  • NN Query ExampleE11E22Visit Rootfollow E1E22E53E55E94ActionHeap2046810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E712595213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013

  • NN Query ExampleE11E22Visit Rootfollow E1E22E53E55E94ActionHeapfollow E2E26E53E55E94E1372046810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E712595213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013

  • NN Query ExampleE11E22Visit Rootfollow E1E22E53E55E94ActionHeapfollow E2E26E53E55E94E137follow E6 j10 i2E53E55E94E137 k132046810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E712595213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013

  • N

View more