processing geospatial data at scale @locationtech
TRANSCRIPT
![Page 1: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/1.jpg)
PROCESSING GEOSPATIAL DATA @ SCALE
Rob Emanuele
GEO(MESA/WAVE/TRELLIS/JINNI)
![Page 2: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/2.jpg)
What we’ll be covering…
What does “processing geospatial data at scale” mean?
Background on big data frameworks
What is LocationTech?
Overview of LocationTech projects for processing big geo data.
![Page 3: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/3.jpg)
PROCESSING GEOSPATIAL DATA @ SCALE
![Page 4: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/4.jpg)
PROCESSING GEOSPATIAL DATA @ SCALE
![Page 5: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/5.jpg)
Large geospatial data
Landsat 8 on AWS: 465,68 scenes @ ~800 MB each. That's 355 TB and counting.
OpenStreetMap edit history: 75 GB compressed.
3 years of geotagged tweets: 3 TB
![Page 6: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/6.jpg)
NED 1/3 arc second
![Page 7: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/7.jpg)
NED 1/3 arc second
![Page 8: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/8.jpg)
NED 1/3 arc second
![Page 9: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/9.jpg)
PROCESSING GEOSPATIAL DATA @ SCALE
![Page 10: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/10.jpg)
PROCESSING GEOSPATIAL DATA @ SCALE
![Page 11: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/11.jpg)
PROCESSING GEOSPATIAL DATA @ SCALE
![Page 12: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/12.jpg)
![Page 13: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/13.jpg)
Project to build a better search engine, back in the early 2000’s.
Worked for small datasets, but was not scalable.
![Page 14: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/14.jpg)
The Google papers
![Page 15: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/15.jpg)
After reading the papers, Nutch developers added a distributed file system and MapReduce model to Nutch.
In 2006, those portions were spun out of Nutch to form…
![Page 16: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/16.jpg)
![Page 17: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/17.jpg)
Apache Hadoop
Heavily supported by Yahoo, which moved it’s large data processing to Hadoop.
by 2007, Twitter, Facebook, LinkedIn and many others were doing serious work with Hadoop
2008 Hadoop graduated to a top level Apache project
![Page 18: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/18.jpg)
Hadoop
Source: http://cs.calvin.edu/courses/cs/374/exercises/12/lab/MapReduceWordCount.png
![Page 19: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/19.jpg)
Matei Zaharia
Worked with Hadoop at UC Berklee
Noticed Hadoop was not a good fit for Machine Learning algorithms and other iterative models.
So in 2009, he created…
![Page 20: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/20.jpg)
![Page 21: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/21.jpg)
![Page 22: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/22.jpg)
Open sourced in 2010 under BSD license
Maintained by UC Berkeley’s AMPLab
Donated to the Apache Software Foundation in 2013 and relicensed as Apache 2.0
Graduated to a top level Apache project in 2014
Apache Spark
![Page 23: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/23.jpg)
Apache Spark
a distributed computation engine.
An API that lets you work with distributed data as a collection.
Written in Scala, with language bindings for use with Java, Python, and R.
![Page 24: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/24.jpg)
![Page 25: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/25.jpg)
![Page 26: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/26.jpg)
![Page 27: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/27.jpg)
![Page 28: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/28.jpg)
![Page 29: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/29.jpg)
![Page 30: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/30.jpg)
2006
![Page 31: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/31.jpg)
![Page 32: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/32.jpg)
Apache Accumulo
Created by the NSA in 2008
Donated to the Apache Foundation in 2011
Graduated to a top level project in 2012
Almost defunded by the US government the same year.
![Page 33: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/33.jpg)
(Sec. 929) Prohibits any DOD component from utilizing the cloud computing database developed by the National Security Agency (NSA) and known as "Accumulo" after the end of FY2013, unless the DOD CIO certifies that: (1) there are no viable commercial open source databases that have such security features, or (2) Accumulo itself has become a successful open source database project. Requires DOD and intelligence community officials to coordinate the use by DOD components of cloud computing infrastructure and services offered by the intelligence community for purposes other than intelligence analysis.
![Page 34: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/34.jpg)
(Sec. 929) Prohibits any DOD component from utilizing the cloud computing database developed by the National Security Agency (NSA) and known as "Accumulo" after the end of FY2013, unless the DOD CIO certifies that: (1) there are no viable commercial open source databases that have such security features, or (2) Accumulo itself has become a successful open source database project. Requires DOD and intelligence community officials to coordinate the use by DOD components of cloud computing infrastructure and services offered by the intelligence community for purposes other than intelligence analysis.
![Page 35: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/35.jpg)
Data Node
Data Node
Data Node
Name Node
Master
Tablet Server
Tablet Server
Tablet Server
Accumulo
BigTable clone (columnar database)
Records stored on HDFS
Lexicographically sorted table index
![Page 36: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/36.jpg)
![Page 37: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/37.jpg)
![Page 38: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/38.jpg)
PROCESSING GEOSPATIAL DATA @ SCALE
![Page 39: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/39.jpg)
PROCESSING GEOSPATIAL DATA @ SCALE
![Page 40: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/40.jpg)
![Page 41: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/41.jpg)
![Page 42: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/42.jpg)
![Page 43: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/43.jpg)
WHAT IS ?
![Page 44: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/44.jpg)
![Page 45: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/45.jpg)
![Page 46: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/46.jpg)
![Page 47: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/47.jpg)
GEOJINNI(FORMERLY SPATIALHADOOP)
![Page 48: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/48.jpg)
Spatial language Built-in spatial data types
Spatial Indexes Spatial Operations
![Page 49: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/49.jpg)
![Page 50: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/50.jpg)
72 Frames × 14 Billion points per frame Total = 1 Trillion points
Generated in three hours on a 10-node cluster
HEAT MAP FROM 2009 TO 2014 MONTH-BY-MONTH
![Page 51: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/51.jpg)
![Page 52: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/52.jpg)
![Page 53: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/53.jpg)
![Page 54: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/54.jpg)
Geo +
accessed through
![Page 55: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/55.jpg)
![Page 56: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/56.jpg)
![Page 57: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/57.jpg)
![Page 58: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/58.jpg)
![Page 59: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/59.jpg)
![Page 60: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/60.jpg)
![Page 61: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/61.jpg)
![Page 62: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/62.jpg)
SELECT tweet.text, user.name FROM tweet, userWHERE bbox(tweet.location, -115, 45, -110, 50) AND tweet.user_id = user.user_id
+
![Page 63: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/63.jpg)
![Page 64: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/64.jpg)
GeoTrellis
a Scala library for geospatial data types and operations.
enables Spark with geospatial capabilities (mainly raster, currently working on vector)
storage and query raster from HDFS, Accumulo, and S3 (Cassandra support in development)
0.10 is released!
![Page 65: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/65.jpg)
![Page 66: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/66.jpg)
![Page 67: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/67.jpg)
![Page 68: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/68.jpg)
![Page 69: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/69.jpg)
Polygonal Summaries
![Page 70: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/70.jpg)
Polygonal Summaries
![Page 71: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/71.jpg)
100 spot instance m3.xlarge workers @ $0.04 / hr = $4.00 / hr
400 CPUs / ≈1.5 TB memory
1 master m3.xlarge on-demand instance @ $0.26 / hr
EMR cluster charge, $0.07 / hr
$4.37 / hr
Rendering elevation with hillshade + NLCD on AWS EMR
![Page 72: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/72.jpg)
![Page 73: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/73.jpg)
![Page 74: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/74.jpg)
![Page 75: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/75.jpg)
![Page 76: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/76.jpg)
Geo +
accessed through
GEOWAVE
![Page 77: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/77.jpg)
Index typeZ-order spatial &
spatiotemporal binned by week
Hilbert in N-dimensions with tiered indexing and
binning
Backends supported
Accumulo (main), Cassandra, HBase,
DynamoDB, Google cloud Bigtable
Accumulo (main), HBase
Servers supported GeoServer GeoServer, MapnikProcessing
Frameworks supportedHadoop, Spark, Storm,
Kafka Hadoop, Spark
Language Scala Java
![Page 78: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/78.jpg)
Index typeZ-order spatial &
spatiotemporal binned by week
Hilbert in N-dimensions with tiered indexing and
binning
Backends supported
Accumulo (main), Cassandra, HBase,
DynamoDB, Google cloud Bigtable
Accumulo (main), HBase
Servers supported GeoServer GeoServer, MapnikProcessing
Frameworks supportedHadoop, Spark, Storm,
Kafka Hadoop, Spark
Language Scala Java
![Page 79: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/79.jpg)
Tiered Indexing
![Page 80: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/80.jpg)
Tiered Indexing
![Page 81: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/81.jpg)
Binning (time dimension)
1997 1998 1999
![Page 82: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/82.jpg)
Binning (arbitrary dimensions)
Time
Elevation
Velocity
![Page 83: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/83.jpg)
![Page 84: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/84.jpg)
Collaboration
![Page 85: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/85.jpg)
Collaboration
![Page 86: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/86.jpg)
![Page 87: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/87.jpg)
• Working together to learn to collaborate
• Making the connections necessary that allow collaboration to flourish
![Page 88: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/88.jpg)
• Join the locationtech-iwg mailing list
• Share you big geospatial data challenges
• Propose projects
Get involved!
![Page 89: Processing Geospatial Data At Scale @locationtech](https://reader034.vdocuments.mx/reader034/viewer/2022051520/58ed67a31a28abf3378b468f/html5/thumbnails/89.jpg)
THANK YOU
@lossyrob
gitter.im/geotrellis/geotrellis
github.com/geotrellis/geotrellis