the osg data federation - internet2€¦ · 06-03-2019  · the osg data federation frank...

12
The OSG Data Federation Frank Würthwein OSG Executive Director UCSD/SDSC

Upload: others

Post on 23-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The OSG Data Federation - Internet2€¦ · 06-03-2019  · The OSG Data Federation Frank Würthwein OSG Executive Director UCSD/SDSC. OSG Data Federation •This talk provides a

The OSG Data FederationFrank Würthwein

OSG Executive DirectorUCSD/SDSC

Page 2: The OSG Data Federation - Internet2€¦ · 06-03-2019  · The OSG Data Federation Frank Würthwein OSG Executive Director UCSD/SDSC. OSG Data Federation •This talk provides a

OSG Data Federation

• This talk provides a superficial introduction to the OSG Data Federation.

- Functionality- Deployed scale- Science Use so far- First attempts at quantifying use

2

Page 3: The OSG Data Federation - Internet2€¦ · 06-03-2019  · The OSG Data Federation Frank Würthwein OSG Executive Director UCSD/SDSC. OSG Data Federation •This talk provides a

Functionality

• Open to any University or National Lab to add their data to the federation hosted on their storage.- Decentralized control of federated namespace and the storage hosting it.- Supporting public and proprietary parts of the namespace.

• Scientists see a single “read-only filesystem” across all data origins- Data publication into namespace- Data in global federation namespace is visible across the entire OSG compute

infrastructure, including XSEDE resources at SDSC, TACC, PSC- Direct random access into the data from the compute nodes on OSG.

• Caches in the network backbone and at endpoints reduce access latencies.- GeoIP is used to automatically select the “closest” cache.- Caches can be configured to cache only part of namespace => Domain Science

specific overlays are possible.• Multiple Origins can serve the same namespace

- Allows for redundancy through replication that is transparent to the users.- Allows data to be moved around within the network of origins for operational convenience.

3

Page 4: The OSG Data Federation - Internet2€¦ · 06-03-2019  · The OSG Data Federation Frank Würthwein OSG Executive Director UCSD/SDSC. OSG Data Federation •This talk provides a

OSG Compute Resources

4

In aggregate ~ 200,000 Intel x86 cores used by ~400 projects across 36 fields of science.

Page 5: The OSG Data Federation - Internet2€¦ · 06-03-2019  · The OSG Data Federation Frank Würthwein OSG Executive Director UCSD/SDSC. OSG Data Federation •This talk provides a

OSG Data Origins

5

SDSC

U.ChicagoFNAL

FNAL: Fermilab based HEP experimentsU.Chicago: general OSG communityCaltech: Public LIGO Data ReleasesUNL: Private LIGO DataSDSC: Simons FoundationNCSA: DES and NASA Earth Science (planned)

Caltech

UNL

Page 6: The OSG Data Federation - Internet2€¦ · 06-03-2019  · The OSG Data Federation Frank Würthwein OSG Executive Director UCSD/SDSC. OSG Data Federation •This talk provides a

Caches in network & at endpoints

6

CalTech

SDSC

UNL

FNALU Chicago

Amazon Direct Connect

Google Dedicated Interconnect

Microsoft Azure ExpressRoute

In Service Planned

OSG Data Origin

Internet 2

CENIC Internet2/Commercial Cloud cross connects

OSG Data Cache

Amsterdam

Cache at I2 peering point with Cloud providers in Chicago

Page 7: The OSG Data Federation - Internet2€¦ · 06-03-2019  · The OSG Data Federation Frank Würthwein OSG Executive Director UCSD/SDSC. OSG Data Federation •This talk provides a

5 top user communities

7

May 2018 September 2018

“Genomics”Jeanjackwpoehlm

Public LIGOgwdata/01

Private LIGOligo

Astronomydes

HEPminerva

We monitor accesses by namespace, cache location, compute site

Page 8: The OSG Data Federation - Internet2€¦ · 06-03-2019  · The OSG Data Federation Frank Würthwein OSG Executive Director UCSD/SDSC. OSG Data Federation •This talk provides a

Last 30 days

8

LIGO public Data

LIGO public Data

LIGO private Data

788 TB Total in last 30 days

Page 9: The OSG Data Federation - Internet2€¦ · 06-03-2019  · The OSG Data Federation Frank Würthwein OSG Executive Director UCSD/SDSC. OSG Data Federation •This talk provides a

Last 6 months

9

180TB/day

90TB/day

Different colors are different parts of the namespace

Page 10: The OSG Data Federation - Internet2€¦ · 06-03-2019  · The OSG Data Federation Frank Würthwein OSG Executive Director UCSD/SDSC. OSG Data Federation •This talk provides a

A case for cachingOSG enabled LIGO to seamlessly use VIRGO resources

55% of OSG enabled LIGO CPU hours are in Europe.NIKHEF + SurfSara represent half of that.

The LIGO workflow reuses each file O(100) times.Total data is only few TB but we moved many petabytes worth out of UNL

before LIGO started using the caches in OSG.

Cache in Amsterdam is effective way to reduce transatlantic network traffic.

Page 11: The OSG Data Federation - Internet2€¦ · 06-03-2019  · The OSG Data Federation Frank Würthwein OSG Executive Director UCSD/SDSC. OSG Data Federation •This talk provides a

Cache performance for LIGO

• Synthetic workload that behaves like PYCBC• Each job pulls 4 randomly selected files into

worker node tmp space.• Measure the time the transfer takes per job vs

number of parallel jobs in cluster.

11

With local cache

Without local cache, Running at UCSD getting data from UNL

Concurrency of jobs: 100, 500, 1000, 1500

More work ongoing to quantify performance of data federation for different applications

Page 12: The OSG Data Federation - Internet2€¦ · 06-03-2019  · The OSG Data Federation Frank Würthwein OSG Executive Director UCSD/SDSC. OSG Data Federation •This talk provides a

Summary & Conclusion

• OSG operates a Data Federation open to all of science.

• Supporting private and public data.• Supporting data publication.• Supporting random IO into files anywhere on OSG.• Supporting caching in network and at compute

clusters to improve performance:- Reduce IO on WAN- Hide access latencies for random IO

• Looking for researchers to try us out.

12