scalability / data / tasks · ci – 2000° me – 1050° ... neutron transport, f-8 department ....
TRANSCRIPT
![Page 1: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/1.jpg)
Scalability / Data / TasksMeeting Scalability Requirements with Large Data and Complex Tasks: Adapting Existing Technologies and Best Practices in SloveniaJan Jona Javor ekšJo ef Stefan Institute ž [email protected] – Slovenian Initiative for National Grid
Jožef Stefan Institute
http://www.ijs.si/ http://www.sling.si/
![Page 2: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/2.jpg)
![Page 3: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/3.jpg)
3/29
Historical
CDC Cyber 74
CONVEX C3860
CONVEX C3860
Zuse Z 23
![Page 4: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/4.jpg)
4/29
SLINGPRIKLJUČENI
CENTRIArctur* – 1024°Arnes – 4400°Atos* – 3000
CIPKeBiP – 990SiGNET – 4200
UNG – 120R4* – 1800°NSC – 1800°
PRIKLJUČENI CENTRI
Arctur* – 1024°Arnes – 4400°Atos* – 3000
CIPKeBiP – 990SiGNET – 4200
UNG – 120R4* – 1800°NSC – 1800°
8 sites
> 18.000 jeder
(> 11.000 ARC-active)
> 1PB disk
> 4 milion jobs / y
HPC, GPGPU, chroot
> 80% SLO capacity
CandidatesMeteo – 2200°
CI – 2000°ME – 1050°
CandidatesMeteo – 2200°
CI – 2000°ME – 1050°
![Page 5: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/5.jpg)
5/29
SLING users● Arnes NREN users● Cluster owners*● Projects*● Individual researchers● University professors● Student groups
*not always ARC
![Page 6: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/6.jpg)
6/29
Use Cases● Particle Physics:
– ATLAS
– Pierre Auger● Theoretical Physics
● Meteo/Geo Modelling
● Fluid Dynamics
● Reactor Physics Simulations
Pierre Auger Observatory
![Page 7: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/7.jpg)
7/29
Use Cases● Life Sciences,
mostly computational (bio-)chemistryand genomics
– IJS users(biology, chemistry,knowledge technologies)
– Collaboration with EMBL
– Diagnostic genomics
– ELIXIR
![Page 8: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/8.jpg)
8/29
Use Cases● Knowledge technologies
– Modelling for different fields
– Genetic alghoriths
– Big/Web data analyisis
– Advanced computationallinguistic models
– CLARIN.si
![Page 9: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/9.jpg)
9/29
Steam explosion moment
![Page 10: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/10.jpg)
10/29
Power distribution for Krsko NPP reactor
Parallel Monte Carlo simulation of neutron transport, F-8 department
![Page 11: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/11.jpg)
11/29
Innovation?● batch system● virtualisation● network?
![Page 12: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/12.jpg)
12/29
ARC and LRMS (batch system)
![Page 13: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/13.jpg)
13/29
ARC Computing Element
![Page 14: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/14.jpg)
14/29
ARC user accounts
![Page 15: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/15.jpg)
15/29
Mix'n'match...CERN Agile modelCVMFS
gLite NorduGrid ARC
SLURMTorque
OpenStack
KeyStone
VOMSdCache
Puppet
OpenMPGlobus
science portalsoVirtOpenNebula
PKI
VRC
Cinder
gFTPGlance
SaltCeph OpenCL
CUDA
![Page 16: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/16.jpg)
16/29
Software Deploymentand Virtualization● Admin install● Compile job● Install job● Shared disk● Shared image
● Environment Modules● Run Time Environments● CHROOTs● Containers● Docker● Shifter
![Page 17: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/17.jpg)
17/29
Storage
●Basic suport●Short-term / local storage●Medium-term storage●Long term storage
![Page 18: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/18.jpg)
18/29
User-Facing Issues● Batch / ARC interface / PKI / VOMS ● Software installations and use● Submission delays, error reporting and debugging● MPI scalability difficulties● Understanding of job and cluster topology● GPGPU use
![Page 19: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/19.jpg)
19/29
Groups and Projects● Job and task management scalability● Data management → task managers● Storage and troughputh→ hardware and cluster setup● Oppurtunistic resource use● Resource optimization→ innovative job models
![Page 20: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/20.jpg)
20/29
ATLAS as an example● ~100 distributed sites● 250k cores used all the time● 200PB of storage space ● 1M jobs/day● 2PB of data is transferred per day between computing sites● Sites include: WLCG GRID sites, HPCs, Clouds, Volunteer computing
![Page 21: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/21.jpg)
21/29
aCT: ARC Control TowerComponents:● Submitter● Status checker● Fetcher● (app verification)● Cleaner
aCT
ARC&table
ARC&engineARC&configApp&config
App&engine
Site&1ARC&CECluster
Site&2ARC&CECluster
Site&3ARC&CECluster
App&table
External&job&provider
DB&(Oracle/MySQL)
![Page 22: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/22.jpg)
22/29
Opportunistic Resouce Use● Grid clusters● HPC clusters● Private computers● Public (commercial) cloud● Microjobs
![Page 23: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/23.jpg)
23/29
ATLAS scaling2010Planned data distributionJobs go to dataMulti-hop data flowsPoor T2 networking across regions
~20 AOD copies distributed worldwide
![Page 24: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/24.jpg)
24/29
ATLAS scaling2010Planned data distributionJobs go to dataMulti-hop data flowsPoor T2 networking across regions
2013Planned & dynamic distribution data Jobs go to data & data to free sitesDirect data flows for most of T2sMany T2s connected to 10Gb/s link
~20 AOD copies distributed worldwide
4 AOD copies distributed worldwide
![Page 25: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/25.jpg)
25/29
Social Component● Accessibility beyond large projects● Long-term funding ● Perception of public clouds● Not invented here syndrome● Users with no Unix experience● Sustainability pressure
![Page 26: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/26.jpg)
26/29
People Involved
Andrej Filip i , č č JSIBarbara Kra ovecš , Arnes, JSIDejan Lesjak, JSIJanez Srakar, JSIJan Jona Javor ek, š JSI+ 4 site administrators
National Initiative:http://www.sling.si/
![Page 27: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/27.jpg)
27/29
Thanks!
Questions?
![Page 28: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/28.jpg)
28/29
New Computing Centre● 200 m² slightly dislocated● New network installation● Water cooling● Not enough power on-site yet● Housing Pikolit, NSC, parts of others● Interesting issues on cost sharing ...
![Page 29: Scalability / Data / Tasks · CI – 2000° ME – 1050° ... neutron transport, F-8 department . 11/29 Innovation? batch system virtualisation network? 12/29 ARC and LRMS (batch](https://reader036.vdocuments.mx/reader036/viewer/2022071210/6021c2ee27f60a0e3311eeb6/html5/thumbnails/29.jpg)
29/29
New Cluster● Grid + HPC● GPGPU: 16 x K80● NorduGrid ARC + SLURM● Considering EGI● Users:
– IJS departments– related research– supported EU– infrastructures
NSC Cluster in Numbers
● ~1800 cores
● ~35 TB scratch
● ~35 TB storage
● ~8 TB RAM