servicing seismic and oil reservoir simulation data...

21
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz Multiscale Computing Lab Biomedical Informatics Department The Ohio State University http://www.bmi.osu.edu http://www.multiscalecomputing.org

Upload: phamtuyen

Post on 08-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Servicing Seismic and Oil Reservoir Simulation Data

through Grid Data ServicesSivaramakrishnan Narayanan, Tahsin Kurc,

Umit Catalyurek and Joel SaltzMultiscale Computing Lab

Biomedical Informatics DepartmentThe Ohio State Universityhttp://www.bmi.osu.edu

http://www.multiscalecomputing.org

VLDB-DMG'05 2

Joel SaltzGagan AgrawalUmit Catalyurek

Shannon Hastings Vijay S Kumar

Tahsin KurcSteve Langella

Scott OsterTony Pan

Benjamin RuttNarayanan Sivaramakrishnan,

Li WengMichael Zhang

Multiscale Computing Labhttp://www.multiscalecomputing.org

VLDB-DMG'05 3

AnalysisProduction rates, bypass

oil, net present value

WorkflowRun new reservoir

simulations

DataSeismic, well

pressures, reservoir simulations

Generate requests for new simulations, new

seismic studies

Obtain initial, boundary conditions, input parameters for

simulations

Store and index simulation results

Summary data from datasets

Spatio-temporal queries

• Simulate multiple realizations of multiple geostatistical models and production strategies

• Evaluate geologic uncertainty and production strategies simultaneously

• Enable on-demand exploration and comparison of multiple scenarios– Integration of a robust,

Grid-based computational and data handling infrastructure

– Distributed databases of reservoir and geophysical data

– Storage and computing resources at multiple institutions

Implementing effective oil and gas production

VLDB-DMG'05 4

Characteristics and Issues• Spatio-temporal datasets

– Simulations carried out/data captured on 3D meshes over many time steps

– Multiple data attributes per data point (gas pressure, oil saturation, seismic traces, etc).

• Very large datasets– Tens of gigabytes to 100+ TB data

• Lots of simulation runs– Up to thousands of runs for a study are possible

• Data can be stored in distributed collection of files • Distributed datasets

– Data may be captured at multiple locations by multiple groups– Simulations are carried out at multiple sites

• Common operations: subsetting, filtering, interpolations, projections, comparisons, frequency counts

VLDB-DMG'05 5

Data Management, Access and Integration

• Tracking of metadata associated with data– Metadata defining simulation parameters, mesh description,

files associated with simulations, etc.– Metadata defining seismic measurements (location, year,

files storing data, etc.)

• Support for data subsetting and filtering on file-based, distributed datasets

• Support for on-demand data product generation– Track metadata associated with data analysis workflows

• Grid data services and distributed querying– Make data and data products available through Grid service

interfaces

VLDB-DMG'05 6

Applications developers generally prefer storing data in files

Support high level queries on multi-dimensional distributed datasets

Many possible data abstractions, query interfacesGrid virtualized object-relational database or XML databaseGrid virtualized objects with user defined methods invoked to

access and process data

Data Virtualization

Our Approach• Support a basic SQL Select query with a virtual relational table

view or a virtual XML database view• A lightweight layer on top of datasets

– Runtime middleware carries out query execution, query planning

VLDB-DMG'05 7

Middleware Support• Data Virtualization: STORM

– Large data querying capabilities, layered on DataCutter– Distributed data virtualization – Indexing, Subsetting, Data Cluster/Decluster, Parallel Data Transfer

• Data Analysis/Processing Workflows: DataCutter– Component Framework for Combined Task/Data Parallelism– Filtering/Program coupling Service: Distributed C++ component

framework– On demand data product generation

• Distributed Metadata and Data Management: Mobius– Create, manage, version data definitions – Management of metadata and data instances– Data integration

• Grid Data Services (OGSA-DAI)– Defines services and interfaces that can be used by clients to specify

operations on data resources and data

VLDB-DMG'05 8

Data Management, Access, Integration

• Grid-level data services via OGSA-DAI• Management of data definitions and

metadata, XML virtualization via Mobius

• Object-relational virtualization and subsetting of file based datasets via STORM

• On-demand data product generation via DataCutter

• STORM, Mobius, DataCutter support data operations on heterogeneous collections of storage and compute clusters

Schema Management

Mobius

Data ProductGeneration

DataCutter

SQL Virtualizationof FilesSTORM

XML VirtualizationMetadata Management

Mobius

OGSA-DAI

OGSA-DAIOGSA-DAI

OGSA-DAI

Grid Protocols

VLDB-DMG'05 9

Data Management, Access, and Integration

Schema Management

Mobius

Data ProductGeneration

DataCutter

SQL Virtualizationof FilesSTORM

XML VirtualizationMetadata Management

Mobius

Grid-data Service (OGSA-DAI)Grid-data Service (OGSA-DAI)

Data ProductGeneration

DataCutter

SQL Virtualizationof FilesSTORM

XML VirtualizationMetadata Management

Mobius

Data ProductGeneration

DataCutter

SQL Virtualizationof FilesSTORM

XML VirtualizationMetadata Management

Mobius

Grid-data Service (OGSA-DAI)Grid Service

ProtocolsGrid-data Service (OGSA-DAI)

Seismic Data

Simulation Data

Seismic/Simulation Data

VLDB-DMG'05 10

Array #

Component #Component #

Component #

Sp (or CDP) #& position

Receiver group # & positionReceiver group #

& positionReceiver group # & position

50.00

50.00

50.00

Component #Component #

Component #

Array #

Receiver group # & positionReceiver group #

& positionReceiver group # & position

50.00

50.00

50.00

Component #Component #

Component #

Array #

Receiver group # & positionReceiver group #

& positionReceiver group # & position

50.00

50.00

50.00

Data Querying and ProcessingSeismic Data

Geostatistics

Model 1

Model 2

Model n

…m realizations

Well Pattern p

Production Strategies

Well Pattern 1

Well Pattern 2

Reservoir Simulations

VLDB-DMG'05 11

STORMSupport efficient selection of the data of interest from

distributed scientific datasets and transfer of data from storage clusters to compute clusters

• Data Subsetting Model– Virtual Tables– Select Queries– Distributed Arrays

SELECT <DataElements>FROM Dataset-1, Dataset-2,…, Dataset-nWHERE <Expression> AND <Filter(<DataElement>)>GROUP-BY-PROCESSOR ComputeAttribute(<DataElement>)

VLDB-DMG'05 12

STORM Services• Query• Meta-data• Indexing• Data Source• Filtering• Partition

Generation• Data Mover

VLDB-DMG'05 13

Grid Data Resource• Grid has emerged as an integrated infrastructure for distributed

computation• OGSA-DAI initiative is to deliver high level data management

functionality for the Grid. – Defines services and interfaces that can be used by clients to specify

operations on data resources and data

• OGSA-DAI services can be configured to expose a specific database management system.

• To be a GDS, a service must accept perform documents and return results– Interpretation of perform documents is open to interpretation– Traditionally wrap SQL queries

VLDB-DMG'05 14

STORM Data Resource

Extractor

Filter

Data Mover

Storm Daemon

JDBC Driver

GDS

STORM instance

Data Resource

VLDB-DMG'05 15

Experimental Setup

• All nodes running linux• Gigabit switch

7.3 TB FAStT600 disk array

4 GB memory2 Xeon 2.4 GHz16Xio

1.5 TB local disk8 GB memoryDual 1.4 GHz AMD Optron

8 nodesmob

Mob,0124 * X / 1MX24 bytes6TXm

Xio,161,0562474240 bytes16Seismic

Mob,033153,84084 bytes21Oil Reservoir

Cluster, Num nodes

Dataset (GB)Records (millions)

Record SizeAttributesDataset

VLDB-DMG'05 16

STORM Results

STORM I/O Performance

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1 2 4 8 16

# XIO nodes

Ban

dwid

th (M

B/s

)

2 Threads4 ThreadsMax

Seismic Datasets10-25GB per file. About 30-35TB of Data.

VLDB-DMG'05 17

Comparison with MySQL - 1• Varying table size. • Per tuple cost is lesser

0

20

40

60

80

100

120

0 50 100 150 200

Table Size (million rows)

Exec

utio

n Ti

me

(sec

s)

MySQL-cold

MySQL-hot

STORM-cold

STORM-hot

VLDB-DMG'05 18

Comparison with MySQL - 2• Varying query size• Also compare them as data resources

0

10

20

30

40

0 250000 500000 750000 1000000

Query Size (num of records)

Exec

utio

n Ti

me

(sec

s)

MySQL

STORM

MySQL-DAI

STORM-DAI

VLDB-DMG'05 19

Oil Reservoir Data Results• Improvements due to: treating records as array of bytes, combining results

at client

0

40

80

120

160

0 100000 200000 300000 400000

Query Size (number of records)

Exec

utio

n Ti

me

(sec

s)

STORM

STORM-DAI-o

STORM-DAI-1

STORM-DAI-50

3 DAIs

VLDB-DMG'05 20

Seismic Data Results

0

10

20

30

40

50

0 2000 4000 6000 8000 10000 12000 14000

Query Size (number of records)

Exec

utio

n Ti

me

(sec

s)

STORM

STORM-DAI-o

STORM-DAI-1

2-DAIs

• 96 x 11GB files on 16 nodes

VLDB-DMG'05 21

Conclusions• Overview of work related to Large Scale Scientific Data

Management at Multi-Scale Computing Lab• Exposed STORM as a Grid Data Service

– Results on use case: Oil reservoir management

• For more info / to download STORM, DataCutter, Mobiushttp://www.multiscalecomputing.org

or

http://www.bmi.osu.edu