cgg’s storage subsystem for seismic processing seismic processing 12 ... produces the subsurface...

17
CGG’s Storage Subsystem For Seismic Processing T. Barragy, 3/2017

Upload: duongquynh

Post on 03-May-2018

220 views

Category:

Documents


1 download

TRANSCRIPT

CGG’s Storage Subsystem For Seismic Processing

T. Barragy, 3/2017

CGG Seismic Processing Challenges

2

CGG is a service company We are very cost sensitive

The type of processing is quite specific vs O&G majors

The data flows are specific to CGG proprietary SW

Data growth rates

Double every 5 years

Data capacity needs are high

Need 10’s of PB online at any given time

Data access needs are modest Modest bandwidth needs

Large transfer (bytes) dominated

Latency tolerant (throughput computing)

Specific types of datasets can be very hot (cacheable)

CGG Data Center

3

Seismic OS Data & Job management, File & Utility services

Hosted on commodity x86 HW

Storage Islands Unix host + 10GbE + direct attach arrays of 250TB raw

Several hundred in total, ~ 20 PB

Data accessed via I/O proxies running on Unix host

CPU Compute Nodes Commodity x86, GbE, HDD for scratch

~ 2 PFLOP (SP)

GPU Compute Nodes Commodity x86 + GPU, GbE, SSD for scratch

~ 20 PFLOP (SP)

Optimized for Small MPI Job Throughput & CGG Workflows

Storage Island: Dell MD Series

4

R620-A & R620-B One logical SI (storage island) per host

Hosts are active:active with failover

Also runs proxy daemons serving data to compute nodes

Each logical SI consists of A single (Posix) filesystem

Linux md stripe across 3 volumes, each RAID6/8+2

Do Not Lose Data, But Keep Cost Down Dual 10GbE connectivity

Dual port SAS drives, controllers

Later versions Add drive arrays, add logical SI

Each logical SI is in its own Linux Container

R620-A

R620-B

MD3260

Data Growth

5

Enormous increase in data volumes over ~ 20 years Increased geophone density, array size, source offset, sample rates

Raw survey size doubles every ~5 years

Intermediate datasets are ~10x of raw datasets

Raw datasets ~100TB

Intermediate datasets ~ 1PB

Data Lifetime

6

Neither Enterprise nor HPC (checkpoint) Enterprise might live years

Checkpoints might live a few hours

30% lives 1 months or less

50% lives 1 – 4 months

Average lifetime ~ a few months Signal Processing: Read / Write chain

Velocity Modelling: Read / Write loop, low res datasets

Final Migration: Read / Write, high res datasets

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10 12 14 16

Frac

tio

n o

f D

ata

Data Age in Months (since modified)

Data Temperature

7

Monte Carlo Profile of /proc Covers 6 months / 50K samples

300 second sample windows

Average Temp – ‘Cool’ Average bandwidth is 1/10 of HW peak

Individual datasets can be very hot (cacheable)

R:W ratio is ~ 2:1 on average

Motivated adding arrays to SI Brings CPU, disk, network into better balance

0

0.05

0.1

0.15

0.2

0.25

0.3

Frac

tio

n o

f Sa

mp

les

MB/Sec, (R + W)

Disk

Network

0.0625 0.125 0.25 0.5 1.0 2.0

Note: axis scale is obfuscated

Data Access

8

Linux Block Layer Trace Method Reconstruct temporal trace to get user level request size

Request Sizes 50% of requests are small – latency / seek dominated

50% of requests are large – data transfer dominated

Request Bytes Read 99% of bytes read are for large multi-MB requests

Is an ‘object store’ feasible? Needs POSIX file system support

Needs to be reasonably efficient for small files / requests

Underlying object sizes of a few MB are suggested

0

10

20

30

40

50

60

70

80

90

100

4 64 1,024 10,240 20,480 102,400

Pe

rce

nt

Read Size in Kbytes

Read Req

Read KBytes

0.1 1.0 10. 20. 100.

Note: axis scale is obfuscated

Object Store

9

Motivation Primarily cost

Secondary is improved data protection & etc

A Work In Progress Currently testing one implementation

DSS7000 target building block

Object Store Characteristics Increased latency (5x) for individual requests

Excellent (linear) scaling with client load

HW RAID Characteristics Low latency at low client load

Performance degrades with increasing client load

0

1000

2000

3000

4000

5000

6000

12 72 144 262 384 528 1024

MB

/s

Client Threads

Seq Read 64KB

Storage Island

EC JBODObject Store

HW RAID

1/64 1/16 1/4 1/2 1

EC JBOD

Note: axis scale is obfuscated

Thank you

Backup & Older

11

Marine Seismic Processing

12

Processing Flow Signal Processing

Sorting / Binning, Filtering / Conditioning / De-Noising

Several months of calendar time, significant interactive analyst time

Velocity Modelling

Migration on coarser or localized mesh

Many iterations, adjust velocity model each time

Several months of calender time, lots of interactive analyst time

Final Migration

Produces the subsurface image

Several weeks of calendar time

Reverse Time Migration

Forward wave eqtn on initial pulse (source)

Backward wave eqtn on receiver data

Cross correlate these two wavefields in time dimension

Yields an image in x,y,z for a single source & receiver group

Sum images over source / receiver groups

Yields the final ‘stacked’ image in x,y,z

𝜕2𝑢

𝜕2𝑡 − 𝑐2 𝛻2𝑢 = 𝑓

Storage Profile

13

Data Lifetime: Mixed 80% < 4 months

70% > 1 month

Data Temperature: On Average, Cool SI average load is 1/8 of HW capability

Individual datasets can be very hot (cacheable)

Data Access: Bimodal Request Size 50% ‘small’ - latency / seek dominated

50% ‘large’ - bandwidth / transfer dominated

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10 12 14 16

Frac

tio

n o

f D

ata

Data Age in Months (since modified)

0

0.05

0.1

0.15

0.2

0.25

0.3

32

45

64

91

128

181

256

362

512

724

1,0

24

1,4

48

2,0

48

2,8

96

4,0

96

Frac

tio

n o

f Sa

mp

les

MB/Sec, (R + W)

Disk Network

0.0625 0.125 0.25 0.5 1.0 2.0 0

10

20

30

40

50

60

70

80

90

100

4 64 1,024 10,240 20,480 102,400

Pe

rce

nt

Read Size in Kbytes

Read Req

Read KBytes

0.1 1.0 10. 20. 100.

CGG: O&G Service Segment

Sercel

– Started in 1956

– Produces seismic sensors

Marine/Land

– Started in 1931

– Acquires seismic data

Subsurface Imaging

– Processes & interprets data

– Very cost sensitive

– Every compute job is revenue

14

Acquisition: Marine Surveys Over Time

15

~1995

‘NAZ’

~2000

‘MAZ’

~2010

‘FAZ’

~2005

‘WAZ’

8

km

1 km

1 km

1 km

8

km

1 km

1 km

1 km

StagSeis / BroadSeis

An Integrated Geoscience Company

Full range of products and clear market leadership onshore, offshore and downhole:

Technology leadership

Large installed base

A cornerstone for CGG integrated solutions

Equipment Acquisition

Full range of seismic and other

geophysical methods for acquisition:

Marine

Land

Multi-Physics

Seabed*

Geology, Geophysics

& Reservoir

Subsurface Imaging

GeoConsulting

GeoSoftware

Multi-Client & New Ventures

Data Management Services

*Seabed Geosolutions joint venture 60%-owned by Fugro and 40% by CGG

Optimal low frequencies are achieved by reducing the zero Hz ghost notch as

much as possible for both source and receivers

The deeper the hydrophone tow depth the better the low frequency signal

Improved low

frequency signal-to-

noise No compromise on

high frequencies

Deeper

Streamers

Ghost notch

diversity

Variable depth

streamers

frequency

cable

Sea Surface

BroadSeis - The Broadest Marine Bandwidth