introduction to hpc in canada - westgrid · pdf fileintroduction to hpc in canada erming pei...
TRANSCRIPT
Introduction to HPC in Canada
Erming Pei
Research Computing Group, UAlbertaCompute Canada / WestGrid
Outline & Schedule
• 10:00 Introduction to Compute Canada (15’)• 10:15 Introduction to WestGrid (15’)• 10:30 Q&A -1 (5’)• 10:35 Break (10’)• 10:45 Introduction to HPC (40’)• 11:25 Q&A -2 (5’)
Introduction to Compute Canada
About Compute Canada
• Compute Canada integrates 4 regional HPC consortia across the country– provides a shared HPC/ARC infrastructure across Canada– supports world-level leading-edge research activities.
• CC aggregates petaflops of computing power and petabytes storage capacity over Canada's high-performance networks.
• CC provides overall services including infrastructure, application, operation and user support for national-wide users.
Compute ConsortiaPreviously, there were 7 consortia.• ACENET • CLUMEQ • RQCHP• HPCVL• SciNet• SHARCNET • WestGrid
Currently, it has been consolidated into four consortia.• WestGrid• Compute Ontario• Calcul Québec• ACENET
Existing Systems & Resources
• ~40 Universities• ~27 Data Centers• ~50 Systems • ~200,000 cores, 2 Pflops, 20PB• ~100 of research software packages • ~200 experts in utilization of ARC for research
https://www.westgrid.ca/events/responding-to-canadas-research-computing-needs
New CC Systems
• UVic, GP1 (Cloud)
• SFU, GP2 (General Purpose)
• UW, GP3 (General Purpose)
• UofT, LP (Large Parallel)
Schedule of New CC SystemsSite/Service Description Availability Resource
GP1 - UVic Large Openstack Cloud
Sept., 2016 3000cores + 40% expansion (2017)
GP2 - SFU General purpose cluster + Cloud partition
Feb., 2017 18,000cores+ 40% expansion (2017)1923 GPU nodes
GP3 - Waterloo Ditto May, 2017 19,000 cores + 40% expansion (2017)64 GPU nodes
LP - UToronto Large parallel Dec., 2017 66,000 cores
National Storage Infrastructure
HSM + Object Storage (All 4 sites)
Oct., 2016 Dozens PBs10PB to start
https://www.computecanada.ca/renewing-canadas-advanced-research-computing-platform/new-systems-at-four-national-sites/
Continuing Development
• Consolidation by 2018– 5-10 Data Centres– 300,000 cores, 12 Pflops, 50+ PB
• 2016-17: Commissioning new systems while decommissioning old systems
CC New Organization Chart
https://staff.computecanada.ca/national_teams/chart
TLC SLC
CloudGP2 GP1
MONLP GP3
NW PSNT RS Storage
SWG
EOT VIZ DH Bio-M SPNTBio-Info
SC
Administration
CC Cloud Service
• Currently Compute Canada has mainly two cloud systems: Cloud West and Cloud East
Access CC clouds
• Cloud East: http://east.cloud.computecanada.ca
• Cloud West: http://west.cloud.computecanada.ca
• Can access with your CC account
CC Cloud Service
OwnCloud
• A Dropbox-like cloud storage service– hosted by WestGrid
• Can access with WestGrid user/password
Globus Online
• High performance data transfer service• https://globus.computecanada.ca
Globus Online
• Needs MyProxy authentication (WestGrid login/passwd)• Can select existing endpoints (GridFTP service in sites)• Can create your personal endpoint with “Globus Connect
Personal”
Intro to WestGrid
About WestGrid
USask
UBC
SFUUVi
c
UNBC
ULeth
UofC Uof
MUofW
UofR
UofA
Banff Center
BU
AU
• WestGrid is one of four regional HPC consortia of Compute Canada
• WestGrid itself has 15 partner institutions across British Columbia, Alberta, Saskatchewan and Manitoba.
TRU
Overall Resources
2012/13
• By far, WestGrid has more than 40,000 compute cores and 9PB storage space.
• About 1,000 Compute Canada users from 475 projects are currently using WestGrid systems.
Text and image source: Lindsay Sill, Intro to WestGrid 2013
* HQP stands for highly qualified personnel
Text and image source: Lindsay Sill, Intro to WestGrid 2013
WestGrid Staff
• Executive Director (Lindsay Sill)• Director of Operations (Patrick Mann)• Collaboration Coordinator• Visualization Coordinator• Site Leads• Programmers• System Analysts• System Administrators
WestGrid Facilities, UofA (Jasper)
• Processors: 4160 cores– 240 nodes with Xeon X5675 processors, 12 cores (2 x 6) and 24 GB of memory. – 160 nodes with Xeon L5420 processors, 8 cores (2 x 4) and 16 GB of memory.
– Interconnect: • Infiniband QDR, 40 Gbit/s, with a 1:1 blocking factor• Infiniband DDR, 20 Gbit/s, with a 2:1 blocking factor
– Storage: ~830TB (356TB Lustre + 280 TB storage servers + 192TB IS10K)– Quickstart: http://www.westgrid.ca/support/quickstart/jasper
WestGrid Facilities, UofA (Hungabee)
• Processors: 2048 coresShared-memory multiprocessor, comprises an SGI UV100 login node and an SGI UV1000 computational node, 16TB memory.
• Interconnect: ccNUMA(cache-coherent non-uniform memory access ), combination of Intel's Quickpath and SGI's NUMAlink
• Storage: 53TB NFS, and 356TB Lustre shared with Jasper• Quickstart: www.westgrid.ca/support/quickstart/hungabee
WestGrid Facilities, UBC (Orcinus)
• Processors: 9600 cores (3072 Intel Xeon E5450 quad-core/16GB RAM + 6528 Xeon X5650 six-core/24GB RAM)
• Storage: ~450TB, Lustre• QuickStart: www.westgrid.ca/support/quickstart/orcinus
WestGrid Facilities, UofC (Breezy)
• Processors: 384 cores (16 node Appro AMD cluster with quad-socket, 6-core AMD Istanbul processors (24 cores @ 2.4 GHz) per node, 256GB RAM/node)
• Interconnect: 4X DDR InfiniBand• Storage: ~450TB, IBRIX• Quickstart: http://www.westgrid.ca/support/quickstart/breezy
WestGrid Facilities, UofC (Lattice)
• Processors: 4096 cores– 512 nodes with Intel Xeon L5520 8-core processor, 12 GB of memory.
– Interconnect: • InfiniBand 4X QDR (Quad Data Rate) 40 Gbit/s, 2:1 blocking
– Storage: 160 TB shared with Lattice and Breezy– Quickstart: http://www.westgrid.ca/support/quickstart/lattice
WestGrid Facilities, UofC (Parallel)
• Processors: 7056 cores– 528 12-core standard Xeon E5649 nodes, 24 GB of RAM.– 60 special nodes with 3 GPGPUs each (NVIDIA Tesla M2070s, 5.5 GB Memory).
• Interconnect: – InfiniBand 4X QDR (Quad Data Rate) 40 Gbit/s, 2:1 blocking
– Storage: 160 TB shared with Lattice and Breezy– Quickstart: http://www.westgrid.ca/support/quickstart/lattice
WestGrid Facilities, UM (Grex)
• Processors: 3792 cores (316 SGI Altix XE cluster, with two 6-core Intel Xeon X5650 2.66GHz processors, 48-96GB RAM/node
• Interconnect: Non-blocking Infiniband 4X QDR• Storage: >100TB• Quickstart: www.westgrid.ca/support/quickstart/glacier
WestGrid Facilities, UVic (Hermes/Nestor)
• Processors: 4416 cores [2112 (Hermes), 2304 (Nestor) ]– IBM iDataplex server with eight 2.67-GHz Xeon x5550 cores with 24 GB of RAM– Dell C6100 servers with twelve 2.66-GHz Xeon x5650 cores and 24 GB of RAM
• Interconnect: – 84 Hermes nodes use two bonded Gigabit/s Ethernet links– New Hermes 4X QDR non-blocking, 32-40Gb/s
• Storage: 1.2PB, GPFS• Quickstart: www.westgrid.ca/support/quickstart/hermes_nestor
WestGrid Facilities, SFU (Bugaboo)
• Processors: 4584 cores– 16 nodes with Intel Xeon E5430 4-core, 16GB/node;– 254 node with Xeon X5650 6-core processor, 24GB/node – 16 Xeon X5355 quad-core processor, 16GB/node
• Interconnect: Infiniband using a 288-port QLogic switch• Storage: ~700TB• Quickstart: www.westgrid.ca/support/quickstart/bugaboo
WestGrid Facilities, USask(Silo)
• Disk: 4.2 PB raw total, 3.15 PB usable– 600 x 1TB SATA drives, RAID 6– 1800 x 2TB SATA drives, RAID 6
• Tape: IBM LTO 3584 tape library– ~3PB total, 1460 x LTO4 tapes, 920 LTO5 tapes.
• Backup System: IBM Tivoli Storage Manager (TSM)– Quickstart: http://www.westgrid.ca/support/quickstart/silo
Site Status
https://www.westgrid.ca/support/system_status
Use CC/WestGrid
• Apply for a CC/WestGrid account• Get a Grid Certificate / Proxy• Existing Resource Classification• New Resource Allocation• Software • Site status• Technical Support
CC/WestGrid Account1. First ask your PI to apply for a Compute Canada account if he/she doesn’t have.
2. Then, you yourself apply for Compute Canada account as part of your PI’s project.
https://www.westgrid.ca/support/accounts/getting_account
3. Your PI approves your application
4. You apply for an consortia account, e.g. WestGrid, ACEnet
Note: It takes a couple of days for your account to be created on all sites.
Grid Certificate 1. Log in to http://portal.westgrid.ca and “Request a Grid Certificate”2. In “My Account” webpage, you will see two buttons for downloading you Grid certificate and private key.
Grid Proxy• Grid proxy is used in submitting Grid jobs or transferring
files across Grid. (Limited lifetime and limited privileges)
• Users just need log in to any WestGrid site and then run: – myproxy-logon
Resource Classification
Program Type SitesSerial Bugaboo, Hermes, JasperParallel Bugaboo, Nestor, Orcinus, Lattice, Parallel,
Jasper, GrexSMP Parallel Breezy, HungabeeLarge memory Grex, Breezy, HungabeeVisualization ParallelGaussian GrexMatlab Orcinus (distributed computing toolbox),
Jasper/Hungabee (UofA license), etc.Storage Silo, Bugaboo
Software
• WestGrid has both free and commercial software.
• You can use software packages in WestGrid– check this webpage to see if certain
software release is already avaliable on WestGrid
• Software list webpagehttps://www.westgrid.ca/support/software
WestGrid support
Any questions, you can ask [email protected]
New Resource Allocation
• RAC (Resource Allocation Competition)– https://www.westgrid.ca/support/accounts/resource_allocations
• RAC = RPP + RRG– RPP: Research Platforms and Portals (scientific/technical review needed)
– RRG: Resources for Research Groups (scientific/technical review needed)
• RAS: Rapid Access Service (formerly “Default Allocation”). No scientific/technical review needed
Email to:[email protected] / [email protected]
New RAC Schedule
Introduction to HPC
Outline• What is HPC• Capability vs. Capacity• Programming model
– Serial/Parallel
• Architecture– SMP/DSM/MPP, UMA/NUMA/COMA
• Interconnect– PCI(E)/Infiniband/NUMALink
• Storage– RAID, Multipathing, Data Bus– DAS/NAS/SAN– Parallel File Systems
• Evolution of Computing– Mainframe, Cluster, Grid, Cloud, Big Data
What is HPC?
• High Performance Computing– most generally refers to the practice of aggregating
computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.
Capability vs. Capacity
• Capability computing is typically thought of as using the maximum computing power to solve a single large problem in the shortest time. – e.g. A real-time weather simulation and prediction application.
• Capacity computing in contrast is typically thought of as using multiple cost-effective computing power to solve a big number of small problems or a small number of big problems. – e.g. Tons of user access to a web service simultaneously or, – To analyze huge amount HEP data: split it into many small
pieces and distribute them across multiple cluster nodes.
Spectrum
• Capability → Capacity
Hungabee• Single system • 2048 cores• 16TB• Hi-speed interconnet
Breezy• 16 fat node cluster• 256GB/node
Bugaboo• 256+ node cluster• 16-24GB/node
BlueGene/Q• 4096 low power nodes• 65536 processor cores
Architectures
• By processor– SMP (Symmetric Multi-Processors )– DSM (Distributed Shared Memory)– MPP (Massive Parallel Processors)
• By memory– UMA (Uniform Memory Access)– NUMA (Non-Uniform Memory Access)– COMA (Cache Only Memory Access)
Evolution of Architectures
Message Passing UMA
COMANUMA
Programming Model
• Serial– Instructions are executed one after another on
a single CPU.• Parallel
– Computations are carried out concurrently on multiple processors.
• SPMD: single program multiple data • MPMD: multiple programs multiple data
Parallel Programming Paradigms/Tools
– Data Parallel• HPF (High Performance Fortran)
– Task Parallel• OpenMP (Open Multi-Processing)
– Message Passing• PVM (Parallel Virtual Machine)• MPI (Message Passing Interface)
– MPICH, Open MPI, etc.
– Hybrid (MPI+OpenMP, MPI+GPGPU)– Advanced: Chapel, PGAS(Partitioned Global Address
Space)
Interconnect
• PCI• PCI Express• Infiniband
• HyperTransport (AMD)• QPI/Omni-path (Intel)• NUMAlink (SGI)
Serial vs. Parallel
• In early days, serial connections were reliable but quite slow, so parallel connections was developed to send multiple pieces of data simultaneously.
• While later it turned out that parallel connections have their own problems – electromagnetical interference between wires.
• So the pendulum swung back to highly-optimized serial connections.
Serial → Parallel → Serial
PCI/PCI-X
• PCI: Peripheral Component Interconnect (32bit)• PCI-X: PCI-eXtended (64bit)
Image source: http://www.altera.com/products/ip/altera/t-alt-pci_soln.html
Electromagnetic interference and signal degradation are common in parallel connections, which slows the connection down. The additional bandwidth of the PCI-X bus means it can carry more data but generates even more noise.
PCI-Express
A single PCI Express lane, can handle 200 MB/s. A 16X PCI-E connector can reach 6.4 GB/s.
• Instead of using the parallel connections, PCI-E has a switch controlled point-to-point serial connections.
• Every device has its own dedicated connection, so devices no longer share bandwidth like they do on a normal data bus.
Image source: http://computer.howstuffworks.com/pci-express2.htm
Infiniband
Image source: http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Fall_2007/wiki4_001_a1
Infiniband• The internal connections in most computers are inflexible and
relatively slow. • As I/O increases, the existing bus system becomes a bottleneck.• While through InfiniBand switches, Infiniband channels are created
to connect hosts (HCAs) and I/O targets (TCAs) • Instead of sending data in parallel across the backplane bus,
Infiniband specifies a serial bus– The serial bus can also carry multiple channels of data at the same time
in a multiplexing signal.
Infiniband theoretical throughput in Gb/s
Infiniband vs. PCI/PCI-Express
http://www.mellanox.com/pdf/whitepapers/PCI_3GIO_IB_WP_120.pdf
Storage• Storage Protocol• I/O BUS
– Serial vs. Parallel• Redundancy
– RAID (Redundant Array of Inexpensive Disks)– Multipathing (Redundant physical paths )
• Storage Attaching Approaches– DAS (Direct Attached Storage)– NAS (Network Attached Storage)– SAN (Storage Area Network)
Storage Protocol• CIFS/SMB (Common Internet File System)
– application-layer network protocol mainly used to provide shared access to files, printers, etc. between nodes
• NFS (Network File System) – application-layer network protocol only allows access to files over an
Ethernet network.
• SCSI/iSCSI (Internet Small Computer System Interface)– iSCSI is a mapping of regular SCSI protocol over TCP/IP
• FC (Fibre Channel)– transport protocol which mainly transports SCSI commands over
Fibre Channel networks• FCoE (Fibre Channel over Ethernet)
– This allows Fibre Channel to use 10 Gigabit Ethernet networks (or higher speeds) while preserving the Fibre Channel protocol.
I/O Bus
• ATA → SATA– ATA (Advanced Technology Attachment)– SATA (Serial ATA)
• SCSI → SAS– SCSI (Small Computer System Interface)– SAS (Serially Attached SCSI)
Parallel → Serial
synchronizationelectromagnetic interference
cost
http://www.denali.com/wordpress/index.php/dmr/2010/02/02/ssd-interfaces-and-performance-effects
RAIDRedundant Array of Independent Disks
• RAID 0: Striping, without parity or mirroring. • RAID 1: Mirroring, without parity or striping.• RAID 2: Bit-level striping with dedicated parity.• RAID 3: Byte-level striping with dedicated parity.• RAID 4: Block-level striping with dedicated parity.• RAID 5: Striping with single distributed parity. • RAID 6: Block-level striping with double distributed
parity.• Nested RAID: RAID 10, RAID 50, RAID 60, etc.
Example: RAID 0, 1, 5
Example: Nested RAID 10, 50
Comparison
http://www.techwarelabs.com/10-things-to-consider-before-setting-up-raid/
Multipathing
• Multipath I/O – Is a fault-tolerance and performance
enhancement technique. – To create multiple logical paths between the
server and the storage devices.• via adapters, cables, and switches, etc.
– In the event that one path fails, multipathing uses an alternate path so that applications can still continuingly access their data.
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/DM_Multipath/
DAS/NAS/SAN
http://abdullrhmanfarram.wordpress.com/2013/04/08/storage-technologies-das-nas-and-san/
• Storage directly attached• High cost of management• Inflexible• Expensive to scale
• Storage access through Ethernet
• Scalable and flexible
• Storage access through FC/IB• Much better performance• More flexible and scalable• Increases data availability
Parallel File System
• Distribute data into multiple storage nodes and access via high-speed network.
• Concurrent (often coordinated) access from many clients
• Provide global shared meta data (locations, file names, sizes, etc)
Parallel File Systems
• Lustre• GPFS• Panasas• NFSv4??
Parallel File Systems
• Lustre• GlusterFS• OrangeFS• GPFS• IBRIX• CXFS• Panasys• PVFS2• PNFS (NFSv4.1)• GoogleFS• Ceph
Example: Lustre
Image source: http://wiki.lustre.org/manual/LustreManual18_HTML/figures/LustreArch.png
Object storage• Object storage appears as a collection of objects.• An object typically includes not only data itself, but some extra
information such as meta data, OID, attributes, etc.• It moves lower-level functionalities such as space management,
security functions into the storage device itself, accessing the device through a standard object interface.
• Especially good for storing unstructured data such as photos, songs, etc.
Block Storage Object Storage
Comparison of 3 storage types
NFS and SMB/CIFS Fibre Channel/iSCSI AWS S3https://insights.ubuntu.com/2015/05/18/what-are-the-different-types-of-storage-block-object-and-file/
Comparison of 3 storage types
http://blog.sungardas.com/CTOLabs/2015/10/object-storage-the-alternator-of-cloud-computing/
Evolution of Computing
• Mainframe: super power • Cluster: worker bees• Grid: global orchestration• Cloud: everything as a service • Big Data: find needle in the sea
Evolution of Computing
Multiple sites (geographically distributed)Global SchedulingVirtualized OrganizationTransparent data accessUnified security infrastructure
Virtualized resourcesElastic computingBuild everything as a service!
Multiple nodesBatch job schedulingParallel computing
Single machineShared memory
Big volumeBig variety Big velocityFast analysis/decision
Mainframe
PC/Cluster
Grid
Cloud
Big Data
Image and test source: http://www.wikipedia.org/
Mainframe
• Originally referred to large cabinets that housed powerful CPU and shared memory.
• Modern design:– Redundant internal engineering
resulting in high reliability and security
– Extensive I/O facilities – High utilization rates – Uses virtualization technology
to support massive throughput
Amdahl 470V/6
Cluster
• Tightly connected computers that work together as a single system– Low cost– scalability– Flexibility
• Batch job scheduling/management• Parallel computing
Grid
• Grid computing is the coordination on massive computer resources from multiple locations, to reach a common goal. The resources are:– loosely coupled – heterogeneous – Geographically dispersed– Dynamic
• Main features:– High level scheduling/Workload management– Unified security infrastructure – Global information system– Virtualized organization – Transparent data transfer interface
Example: WLCG (Worldwide LHC Computing Grid)
Cloud• Initially
– IAAS (Infrastructure as a Service)– PAAS (Platform as a Service)– SAAS (Software as a Service)
• Subsequently– HAAS (Hardware as a Service)– NAAS (Network as a Service)– DAAS (Database as a Service)– CAAS (Communication as a Service)– BPAAS (Business Process as a Service)
• Eventually– XAAS (Everything as a service!)
Image source: www.telezent.com
X
Image source: http://blueatoll.com/blog/the-next-generation-enterprise-business-as-a-service-in-the-cloud/
Big Data
• What is Big Data– refers to technologies of handling data that is
too diverse, fast-changing or massive for conventional technologies to address efficiently.
– Today new technologies make it possible to realize value from Big Data.
Big Data’s Four V’s
http://www.ibmbigdatahub.com/blog/how-big-data-and-cognitive-computing-are-transforming-insurance-part-2
Big Data: Core Technology
• Foundation stone– Google (GFS, MapReduce, Big Table)
• Free version– Apache (HDFS, YARN, Hbase, Hive, Pig…)
Big Data: MapReduce
Image source: http://www.slideshare.net/tothc/introduction-to-hadoop-and-map-reduce
Big Data: Evolution
• New Troika– Google (Dremel, Pregel, Caffeine)
• Free version– Apache Drill, Apache Giraph, Stanford GPS
Image source: http://blog.mikiobraun.de/2013/02/big-data-beyond-map-reduce-googles-papers.html
Example: MapReduce vs. Dremel
Query: SELECT SUM(CountWords(txtField)) / COUNT(*) FROM T1 (T1: 85 billion records, 87 TB, 3000 nodes)
Image source: http://www.cubrid.org/blog/dev-platform/meet-impala-open-source-real-time-sql-querying-on-hadoop/
Big Data Ecosystem
Summary
• Introduced Compute Canada and its consortia • Introduced WestGrid and the member sites• Introduced high performance computing from different
angles such as architecture, memory, interconnect, storage, file system, etc.
• Also briefed evolution of computing technologies from mainframe, cluster, grid, cloud, to the current hot topic —— Big Data.
Follow-up Talks
• Sept.15, 2016 Tips for Submitting jobs & Moving Data (with Hands-on session)– Masao Fujinaga
• Sept. 27, 2016 Scheduling & Job Management (with Hands-on session)– Kamil Marcinkowski
See more details in: https://www.westgrid.ca/events
Thanks!Questions?