reverse time migration via resilient distributed datasets: towards in-memory coherence of...

26
Reverse Time Migration via Resilient Distributed Datasets: Towards In- Memory Coherence of Seismic-Reflection Wavefields using Apache Spark Ian Lumb HPCS 2015 - Montreal http://hpcs.ca

Upload: ian-lumb

Post on 05-Aug-2015

311 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory

Coherence of Seismic-Reflection Wavefields using Apache Spark

Ian Lumb

HPCS 2015 - Montreal

http://hpcs.ca

Page 2: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

Outline

● The challenges and opportunities of RTM● Refactoring RTM with Spark/RDDs

o Spark’ing coherence between wavefields● Summary

Page 3: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

http://www.acceleware.com/technical-papers

Page 5: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark
Page 6: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

Motivation

● RTM is performance-challengedo Algorithms research remains topical

GPUs responsible for compelling results● Revisit RTM as a ‘Big Data problem’

o In-memory analytics has the potential to Improve performance of data and wavefield

manipulations in concert with computations Introduce new prospects for imaging conditions

Page 7: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark
Page 8: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

Key Performance Challenges● RTM modeling kernel is compute intensive

o Stable, non-dispersive solution via FDM requires Small time steps and small grid intervals Higher-order approximations of the spatial

derivatives● RTM wavefields exceed memory capacity

o Multiple-TB source volumes must be stored to disk

e.g., Liu et al., Computers & Geosciences 59 (2013) 17–23

Page 9: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

Resilient Distributed Datasets (RDDs)

● Abstraction for in-memory computing● Fault-tolerant, parallel data structures

o Cluster-ready● Optionally persistent ● Can be partitioned for optimal placement● Manipulated via operators

Zaharia et al., NSDI 2012

Page 10: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

RTM via RDDs: Implementation using Spark● Apache Spark is an implementation of RDDs● Make use of HDFS or alternative FS

o GPFS, AWS S3, OpenStack Swift, Ceph or Lustre● Choose appropriate programming model(s)

o Not limited to MapReduceo Iterative and/or interactive (including streaming)

● Manage Spark workloads o Built-in mode or YARN mode, Mesoso Univa Universal Resource Broker after Lumb, insideBIGDATA

http://insidebigdata.com/2015/03/06/8-reasons-apache-spark-hot/

Page 11: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

RTM via RDDs: Implementation using Spark (2)

● Deployable on bare metal … cloudso Monitoring/management Bright Cluster Manager

● Introduces analytics possibilities for RTMo Program in Java (C/C++ via JNA), Scala or Python

● Uptake is significant - rapidly growing community● Results are extremely impressive

o Exploit CPUs and/or GPUs after Lumb, insideBIGDATA http://insidebigdata.com/2015/03/06/8-reasons-apache-spark-hot/

Page 12: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

RTM via RDDs: Opportunities● Apply RDDs to gathers of seismic data

o Partition RDDs optimally for wavefields calculations● Apply RDDs to source wavefields

o Partition RDDs optimally for cross-correlation of forward and reverse time wavefields Significantly reduce/eliminate disk I/O

● Investigate alternate imaging conditionso Machine-learning and/or graph-analytics algorithms

in addition to cross-correlation

Page 13: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark
Page 14: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

SparkWorkers

Spark (YARN) Master

Sparkor YARN

Page 15: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

http://www.informationweek.com/big-data/big-data-analytics/apache-spark-3-promising-use-cases/a/d-id/1319660

Page 16: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark
Page 17: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

http://ipython.org/notebook.html

Page 18: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

Thunder: Initial Impressions● Written in Spark's Python API (Pyspark)

o Makes use of scipy, numpy, and scikit-learn● IPython Notebook serves as interactive GUI

Runs in a Web browser Notebooks can include text and graphics Secure, remote access to an in-cluster IPython

Notebook server ● Includes modular functions for time-series analysis● Can interface with C/C++ from Python

http://thunder-project.org/

Page 19: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark
Page 20: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

Is there a case for migration?● In-memory computing via RDDs is promising

o Application to gathers and wavefields● Spark provides analytics upside

o Imaging conditions other than cross-correlation ● Spark may be applicable to modeling kernels ● Spark can be easily incorporated into pre-existing IT

infrastructureso Compliments existing HPC environments

http://rice2015oghpc.rice.edu/technical-program/

Page 21: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

Summary● Is there a case for migration?

o From: RTM via HPC o To: RTM via Big Data or ( Big Data and HPC )

● Does it make sense to refactor other HPC problems as ‘Big Data problems’?

Page 22: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

Resilient Distributed Datasets (RDDs)

● Abstraction for in-memory computing● Fault-tolerant, parallel data structures

o Cluster-ready● Optionally persistent ● Can be partitioned for optimal placement● Manipulated via operators

Zaharia et al., NSDI 2012

Page 23: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

Refactoring HPC with Spark/RDDs …

● Could Spark/RDDs replace MPI?o Spark has primitives for distributed in-memory

parallel computing … including fault tolerance

Page 24: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

Acknowledgements

● M. Zaharia et al. for RDDs● Communities responsible for Spark, Python & Thunder● M. Lamarca, P. Labropoulos, D. Shestakov & L.

Gibbons at Bright Computing

Page 25: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark

Questions?Ian Lumb

[email protected]@brightcomputing.com