clouds in high energy physics - university of victoria · pdf fileclouds in high energy...

20
Clouds in High Energy Physics Randall Sobie University of Victoria Randall Sobie IPP/Victoria 1

Upload: phungbao

Post on 02-Mar-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

CloudsinHighEnergyPhysics

RandallSobie

UniversityofVictoria

RandallSobieIPP/Victoria 1

Page 2: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

Overview

•  CloudsareintegralpartofourHEPcomputinginfrastructure

–  PrimarilyInfrastructure-as-a-Service(IAAS)

•  Ouruseiswideranginganddiverse

–  CERNAgileInfrastructure

–  Tier-1computingatcentressuchasBNL,FNALandRAL

–  Tier-2computingaroundtheworld

•  ExpandinguseofHEP-clouds,privatecloudsandcommercialclouds

RandallSobieIPP/Victoria 2

Page 3: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

Motivation

RandallSobieIPP/Victoria 3

Awiderangeofreasonsforusingclouds

•  Easemanagementofexistinginfrastructure

•  Separationofapplicationandsystem

administration

•  Simplifiesallocationofresources

•  Leveragesoftwaredevelopment

•  Opportunisticcomputing

•  Non-HEPcomputingcentres

•  Commercialcloudresources

Page 4: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

Typesofcloudresources

RandallSobieIPP/Victoria 4

DedicatedVirtual

cluster

CloudcomputinginHEPistypicallyproviding5-20%oftheprocessingofcurrentprojects

“Dedicated”clouds(OwnedbyHEP)

“Opportunistic”clouds(privateandcommercial)Opportunistic

Page 5: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

Clouddeployments

RandallSobieIPP/Victoria 5

Traditionalbare-metal

Staticcloud(e.g..LTDABaBar,HLTclouds)

Standalone/privatecloud(e.g.PNNL,NorduGrid)

Bare-metalorin-housecloudwithexternalcloud(e.g..CERN,BNL)

Distributedclouds(e.g.UK,Canada,Australia,INFNClouds)

Page 6: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

RandallSobieIPP/Victoria 6

D. Giordano WLCG Workshop 9/10/2016

OpenStack Clouds at CERN In production:

q  4 clouds q  >230K cores q  >8,000

hypervisors

90% of CERN’s compute resources are now delivered on top of OpenStack

A further 42K cores to be installed in next few months

2

>90%ofCERNcomputeresourcesarevirtualizedUpto42Kcorestobeinstalledinthenextfewmonthssubjecttofunding

Page 7: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

RandallSobieIPP/Victoria 7

condor_collector condor_nego,ator

Workernodes

condor_startd

condor_rooster

Virtualworkernodes

condor_startd

ARC/CREAMCEs

condor_schedd

Centralmanager

OfflinemachineClassAds

Draining

CloudsatRAL

•  892coresu,lizingaCephstoragebackend.

•  3alterna,ngracksofCPUandStoragenodes.

•  Tier1servicesnowrunningonCloudVMs.

•  EngagingwithvariousEuropeanCloudprojects(e.gDataCloud).

S3andSwiSStorage•  StoringDockerimagesforContainer

Orchestra,onviaSwiS.

BatchWorkontheCloud•  For~1.5yearstheRALHTCondorbatchsystemhas

madeopportunis,cuseofunusedcloudresources.

•  HTCondorroosterdaemonusedtoprovisionVMs.

•  Runningjobsfromall4LHCexperiments&manynon-LHCexperiments.

Openstackserviceunderdevelopment.AvailabletoLHCVOsnextyear.

Page 8: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

RandallSobieIPP/Victoria 8

13

FNAL/CMS AWS January/February 2016

9

BNL/ATLAS AWS September 2015

Tier-1CloudburstingontoEC2

Page 9: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

RandallSobieIPP/Victoria 9

SpecialpurposecloudsBaBarLongTermDataAccess(LTDA)System

AbilitytopreservedataandanalysiscapabilityforBaBar

(stoppeddatatakingin2008)

Page 10: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

RandallSobieIPP/Victoria 10

HighleveltriggerfarmsoftheLHCExperiments(largemulti-10Kcoresystems)

Virtualmachinesarebootedduringno-beamperiods

Page 11: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

ExamplesofTier-2clouddeployments

RandallSobieIPP/Victoria 11

Page 12: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

RandallSobieIPP/Victoria 12

Research Cloud

CREAM CE

Dynamic Torque

TORQUE + Maui

TORQUE + Maui

control VMs

distribute jobs via SSH

(Currently 700 cores)

14,000 HEPSpec ~ (1400 cores)

Australia-ATLAs Tier 2

(Belle II) LCG.Melbourne.au

Australian Belle II Grid Site

Dynamic Torque

SingleCREAMCEservicesATLASTier-2(Torque)andBelleIIsite(DynamicTorque)

SWITCHengines–SwissNRENcommercialcloud

Why private cloud?

  Chosen for flexibility, efficient use of compute resources for services   Provides easy load-balancing and availability features   Provides templating features   Easy re-use of templates to test and instantiate new server instances   Non-systems staff can provision their own instances of services   Software Defined Networking is more malleable than physical

networking, encourages better networking practices, including security

Page 13: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

RandallSobieIPP/Victoria 13

UK/GridPP•  CloudsatHEPinstitutions(Oxford/Imperial).•  ECDFcloudinEdinburghhasrecentlymadeavailabletotheHEP•  UKVacuumdeployment•  Commercialcloud–DataCentredOpenstackcloud

Italy/INFN•  PrivateOpenStackCloud(Padova-Legnaro)calledCLOUDAREAPADOVANA•  ~25usergroups/project•  CMSproduction

PNNL/WashingtonPrivateOpenStackcloudforBelleIIproject(KEK)andotherlocalusers

Page 14: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

RandallSobieIPP/Victoria 14

CloudSchedulerCloudScheduler

HTCondor

Job

VM

ComputeCloud

VMImage

Repository

Starts job

Start VMs

Submit user script

CanadaDistributedcloudsystemforATLASandBelleII10-15cloudsHTCondor/CloudScheduler4000-5000cores

ATLASjobsoncloudforCA-system10clouds4300cores

Page 15: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

Jobscheduling/VMprovisioning

•  VarietyofmethodsforrunningHEPworkloadsonclouds

–  VM-DIRAC(LHCbandBelleII)

–  VAC/Vcycle(UK)

–  HTCondor/CloudScheduler(Canada)

–  HTC/GlideinWMS(FNAL),HTC/VM(PNNL),HTC/APF(BNL)

–  Dynamic-Torque(Australia)

–  CloudAreaPadovana(INFN)

–  ARC(NorduGrid)

•  Eachmethodhasitsownmeritsandoftenwasdesignedtointegrated

cloudsintoanexistinginfrastructure(e.g.local,WLCGandexperiment)

RandallSobieIPP/Victoria 15

Page 16: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

Commercialclouds

•  AmazonEC2andMicrosoftAzure

–  Short-termmulti-10Ktests

–  Long-term1K-scaleproduction

•  GCEevaluationbutnoproduction

•  OthercommercialOpenStackclouds

–  DataCentred(UK),SWITCHengines(Switzerland)

•  CERNcommercialcloudprocurement

RandallSobieIPP/Victoria 16

Page 17: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

Networkconnectivity

•  AmazonandMicrosoftcloudsareconnectedtotheresearchnetworksin

NorthAmerica(probablyGCEaswell)

•  Trans-borderortrans-oceantrafficcanbeanissue

–  BecomeanimportantdiscussiontopicintheLHCONEmeetings

•  Privateopportunisticclouds

–  trafficflowsoverresearchnetworkbutnotLHCONEnetwork

RandallSobieIPP/Victoria 17

Page 18: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

RandallSobieIPP/Victoria 18

CPUBenchmarks

Newsuiteof“fast”benchmarks

–  HEPiXBenchmarkWorkingGroup

–  Suiteavailableincludes“fastHS”(LHCb)andWhetstonebenchmarks

•  WritetoElasticSearchDB

–  RunbenchmarksinthepilotjoborduringthebootoftheVM

Datastorage

–  DatawrittentolocalstorageonnodeandthentransferredtoselectedSE

–  UKgrouphasdonesomeworkintegratingtheirobjectstorewithATLAS

–  BNLusingS3storageonEC2forT2-SE

Page 19: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

Monitoring

RandallSobieIPP/Victoria 19

CloudSystemmonitorSensu,Munin,RabbitMQ,Mongo-DB

ApplicationmonitorPanda

Application

BenchmarksandaccountingElasticSearchDB

Cloudorsitemonitor

Page 20: Clouds in High Energy Physics - University of Victoria · PDF fileClouds in High Energy Physics Randall Sobie University of Victoria ... HTCondor/CloudScheduler 4000-5000 cores ATLAS

Summary

•  CloudsinHEP

–  Growing,diverseuseofclouds

–  Typicallyintegratedintoanexistinginfrastructure

–  Seenasawaytobettermanagemulti-userresources

•  Opportunisticresearchclouds

–  Easywaytoutilizecloudsatnon-HEPresearchcomputingfacilities

–  Norequirementforon-siteapplicationspecialistsorcomplexsoftware

•  Commercialclouds

–  EC2/AzureaswellasotherOpenStackclouds

–  Trans-bordernetworkconnectivitychallenges

RandallSobieIPP/Victoria 20