rapid&ict&prototyping&in& ireland&with&ichec&&€¦ · irish centre...
TRANSCRIPT
Irish Centre for High-End Computing
Rapid ICT prototyping in Ireland with ICHEC
Overview
Jean-‐Christophe “JC” Desplat
11th February 2015
Irish Centre for High-End Computing
Agenda
• Centre overview • Technology walkthrough • Training & educaIon programme • Data analyIcs • Business engagement model
Irish Centre for High-End Computing
Scalability
Performance
Active industry engagement: Making technology work for you • Experts have wide ranging skill set with career backgrounds
in industry and academia
• Teams combining software engineers, domain experts and accredited PRINCE2/Agile project managers
• Flexible and responsive industry engagement model
National Technology Centre
• Established in 2005 • University hosted (with national remit) • 28 staff in Dublin & Galway
Mandate includes: • High-Performance Computing (HPC) & Big Data / analytics • Industry & Public Sector engagement
Irish Centre for High-End Computing
National e-Infrastructure Storage Nexsan E60 574TB (formatted) DDN SFA12k-20 550TB (formatted) Panasas AS5200 175TB (formatted)
Fionn: SGI ICE X 7,680 E5-2660v2 cores c.148 Tflops 20.5TB RAM SGI Pyramid 640 E5-2660v2 cores 32 Xeon Phi 5110P 32 NVIDIA K20m
SGI UV2000 1.7TB RAM 2 Xeon Phi 5110P
Near-mission critical services with emergency service failover
Industry Test beds • R&D • Production (in procurement)
Irish Centre for High-End Computing
ICHEC uses novel ICT technologies to enable new data intensive applications
Nvidia Many integrated core GPU Intel Multi integrated core CPU Xilinx programmable logic
Storage
Compute
Cloud
Irish Centre for High-End Computing
Data analytics Molecular dynamics
Weather forecasting
“ICHEC joined our globally competitive Intel Parallel Computing Centre programme. ICHEC were selected because of their exceptional parallel software skills and their problem solving approach to HPC and big data challenges.”
Brian Quinn Director strategic programs at Intel Labs Europe
Financial and Genomics
Irish Centre for High-End Computing
High frequency trading E-commerce Cloud computing
“Xilinx needs a new generation of design environments that abstract away the complexity of the hardware. ICHEC really helps us with the proliferation of the design expertise as they can develop the education material that is required to train up the next generation of programmers.”
Michaela Blott Principal Engineer - Xilinx Research
Irish Centre for High-End Computing
ECTS accredited graduate modules
Custom training for 11 cloud system admin positions (in year 1)
Public sector R courses
Education and outreach
ICHEC designed and delivered personalised R courses across 14 public sector organisations to address a skills demand and enable innovation.
In progress
Irish Centre for High-End Computing
CSO UNECE Partnership Partnership with Irish Central Statistics Office & United Nations Economic Commission for Europe. High level group to develop best practices in the analysis of civic data through Hadoop workflows.
• Managing Data Analytics 'Sandbox' for UNECE programme on Big Data in Official Statistics
• Hadoop Cluster ( 20 nodes ) - Hortonworks data platform - Ancillary tools e.g. R
Irish Centre for High-End Computing
Engagement Model • Feasibility project – €9K (funded by EI) • InnovaIon Partnership – up to 80%
• Strategic Partnership programme – 50%
• Consultancy – SoYware development, process validaIon, training – Flexible access to resources – Value for money
Irish Centre for High-End Computing
See our Industry tesImonials on YouTube
Engineering ICT Training & Education
http://www.youtube.com/user/ichecireland
Irish Centre for High-End Computing
ExploiIng novel ICT technologies with ICHEC
Dr. Michael Lysaght
11th February 2015
Irish Centre for High-End Computing
More cores, More Performance
Intel Xeon Phi Coprocessor NVIDIA K80
Irish Centre for High-End Computing
More cores, More Performance
Intel Xeon Phi Coprocessor
• 1997: The 1st Intel Teraflop Computer • 9298 Intel Processors • 72 Server Cabinets
Irish Centre for High-End Computing
More cores, More Performance
Intel Xeon Phi Coprocessor
• 1997: The 1st Intel Teraflop Computer • 9298 Intel Processors • 72 Server Cabinets
• 2013: The Intel Xeon Phi Coprocessor • 1 Teraflop of Performance • 1 PCIe Slot
Irish Centre for High-End Computing
More cores, More Performance
Intel Xeon Phi Coprocessor
Logical layout of functional components
Xeon Phi Introduction Tim Cramer | Rechen- und Kommunikationszentrum
6
Architecture (1/2)
Instruction Decode
32k/32k L1 Cache inst/data
512k L2 Cache
Scalar Unit
Scalar Registers
Vector Unit
Vector Registers
Ring
Intel Xeon Phi Coprocessor • 1 x Intel Xeon Phi @ 1090 MHz • 60 Cores (in-order) • ~ 1 TFLOPS DP Peak • 4 hardware threads per core • 8 GB GDDR5 memory • 512-bit SIMD vectors (32 registers) • Fully-coherent L1 and L2 caches • Plugged into PCI Express bus
Source: Intel
Xeon Phi core
Irish Centre for High-End Computing
More cores, More Performance
Intel Xeon Phi Coprocessor
Logical layout of functional components
60 cores
Xeon Phi Introduction Tim Cramer | Rechen- und Kommunikationszentrum
6
Architecture (1/2)
Instruction Decode
32k/32k L1 Cache inst/data
512k L2 Cache
Scalar Unit
Scalar Registers
Vector Unit
Vector Registers
Ring
Intel Xeon Phi Coprocessor • 1 x Intel Xeon Phi @ 1090 MHz • 60 Cores (in-order) • ~ 1 TFLOPS DP Peak • 4 hardware threads per core • 8 GB GDDR5 memory • 512-bit SIMD vectors (32 registers) • Fully-coherent L1 and L2 caches • Plugged into PCI Express bus
Source: Intel
Xeon Phi core
Irish Centre for High-End Computing
More cores, More Performance
Intel Xeon Phi Coprocessor
Logical layout of functional components
60 cores
Xeon Phi Introduction Tim Cramer | Rechen- und Kommunikationszentrum
6
Architecture (1/2)
Instruction Decode
32k/32k L1 Cache inst/data
512k L2 Cache
Scalar Unit
Scalar Registers
Vector Unit
Vector Registers
Ring
Intel Xeon Phi Coprocessor • 1 x Intel Xeon Phi @ 1090 MHz • 60 Cores (in-order) • ~ 1 TFLOPS DP Peak • 4 hardware threads per core • 8 GB GDDR5 memory • 512-bit SIMD vectors (32 registers) • Fully-coherent L1 and L2 caches • Plugged into PCI Express bus
Source: Intel
512b VPU
Xeon Phi core
Irish Centre for High-End Computing
More cores, More Performance
Intel Xeon Phi Coprocessor Accelerated software is a differentiator!
Irish Centre for High-End Computing
More cores, More Performance
Intel Xeon Phi Processor Accelerated software is a differentiator!
Irish Centre for High-End Computing
AcceleraIng SoluIons on GPGPU • Quan%ta%ve Finance*: - In-‐house code (NDA) - London-‐based - Real-‐Ime risk simulaIons
• Applica%ons: - Profit-‐margin analysis
Source: Talk “Real-Time Risk Simulation The GPU Revolution In Profit Margin Analysis” (http://bit.ly/13sWgH1), May 2012
Irish Centre for High-End Computing
AcceleraIng SoluIons on GPGPU • Quan%ta%ve Finance*: - In-‐house code (NDA) - London-‐based - Real-‐Ime risk simulaIons
• Applica%ons: - Profit-‐margin analysis
Source: Talk “Real-Time Risk Simulation The GPU Revolution In Profit Margin Analysis” (http://bit.ly/13sWgH1), May 2012
Irish Centre for High-End Computing
AcceleraIng SoluIons on GPGPU • Quan%ta%ve Finance*: - In-‐house code (NDA) - London-‐based - Real-‐Ime risk simulaIons
• Applica%ons: - Profit-‐margin analysis
• Project: 3 week time-frame
Source: Talk “Real-Time Risk Simulation The GPU Revolution In Profit Margin Analysis” (http://bit.ly/13sWgH1), May 2012
Irish Centre for High-End Computing
AcceleraIng SoluIons on GPGPU • Quan%ta%ve Finance*: - In-‐house code (NDA) - London-‐based - Real-‐Ime risk simulaIons
• Applica%ons: - Profit-‐margin analysis
• Project: 3 week time-frame
• Requirements: 500ms response time
Source: Talk “Real-Time Risk Simulation The GPU Revolution In Profit Margin Analysis” (http://bit.ly/13sWgH1), May 2012
Irish Centre for High-End Computing
AcceleraIng SoluIons on GPGPU • Quan%ta%ve Finance*: - In-‐house code (NDA) - London-‐based - Real-‐Ime risk simulaIons
• Applica%ons: - Profit-‐margin analysis
• Project: 3 week time-frame
• Requirements: 500ms response time
• Challenge: How many more simulations on GPU?
Source: Talk “Real-Time Risk Simulation The GPU Revolution In Profit Margin Analysis” (http://bit.ly/13sWgH1), May 2012
Irish Centre for High-End Computing
AcceleraIng SoluIons on GPGPU Deployable solution delivered within 3 weeks
__device__ float getProb(int timeIndex1, int indexToUse, int timeIndex2){
indexToUse = max(min(indexToUse, 80), 30);
if (timeIndex1 >= my_Constants.get2D(delimiter, timeIndex2, 1) && timeIndex2 > 2){
return my_Constants.get2D(gProb,
timeIndex1 - my_Constants.get2D(delimiter, timeIndex2, 1), indexToUse);
} else
return 0.0f;
}
Irish Centre for High-End Computing
AcceleraIng SoluIons on GPGPU
__device__ float getProb(int timeIndex1, int indexToUse, int timeIndex2){
indexToUse = max(min(indexToUse, 80), 30);
if (timeIndex1 >= my_Constants.get2D(delimiter, timeIndex2, 1) && timeIndex2 > 2){
return my_Constants.get2D(gProb,
timeIndex1 - my_Constants.get2D(delimiter, timeIndex2, 1), indexToUse);
} else
return 0.0f;
}
240x speedup
Deployable solution delivered within 3 weeks
Irish Centre for High-End Computing
AcceleraIng SoluIons on GPGPU
__device__ float getProb(int timeIndex1, int indexToUse, int timeIndex2){
indexToUse = max(min(indexToUse, 80), 30);
if (timeIndex1 >= my_Constants.get2D(delimiter, timeIndex2, 1) && timeIndex2 > 2){
return my_Constants.get2D(gProb,
timeIndex1 - my_Constants.get2D(delimiter, timeIndex2, 1), indexToUse);
} else
return 0.0f;
}
240x speedup
Same 500ms time window: 100x to 1000x more simulations
Deployable solution delivered within 3 weeks
Irish Centre for High-End Computing
AcceleraIng SoluIons on GPGPU
__device__ float getProb(int timeIndex1, int indexToUse, int timeIndex2){
indexToUse = max(min(indexToUse, 80), 30);
if (timeIndex1 >= my_Constants.get2D(delimiter, timeIndex2, 1) && timeIndex2 > 2){
return my_Constants.get2D(gProb,
timeIndex1 - my_Constants.get2D(delimiter, timeIndex2, 1), indexToUse);
} else
return 0.0f;
}
240x speedup
Same 500ms time window: 100x to 1000x more simulations
*Winner of HPCWire RCA Award 2012 for ‘Most Innovative Use of HPC in Finance’
Deployable solution delivered within 3 weeks
Irish Centre for High-End Computing
AcceleraIng soluIons on Burst-‐Buffer
• Oil & Gas Project 1: - In-‐house code - Seismic Imaging - Petabytes of data
• Applica%ons: - Oil & Gas
Irish Centre for High-End Computing
AcceleraIng soluIons on Burst-‐Buffer
• Oil & Gas Project 1: - In-‐house code - Seismic Imaging - Petabytes of data
• Applica%ons: - Oil & Gas
Irish Centre for High-End Computing
AcceleraIng soluIons on Burst-‐Buffer
• Oil & Gas Project 1: - In-‐house code - Seismic Imaging - Petabytes of data
• Applica%ons: - Oil & Gas
• Project: 3 week time-frame
Irish Centre for High-End Computing
AcceleraIng soluIons on Burst-‐Buffer
• Oil & Gas Project 1: - In-‐house code - Seismic Imaging - Petabytes of data
• Applica%ons: - Oil & Gas
• Project: 3 week time-frame
• Requirements: Reduce I/O bottlenecks
Irish Centre for High-End Computing
AcceleraIng soluIons on Burst-‐Buffer
• Oil & Gas Project 1: - In-‐house code - Seismic Imaging - Petabytes of data
• Applica%ons: - Oil & Gas
• Project: 3 week time-frame
• Requirements: Reduce I/O bottlenecks
• Challenge: Space and power savings
Irish Centre for High-End Computing
AcceleraIng soluIons on Burst-‐Buffer
DDN-ICHEC Whitepaper: Experimenting on IME with an Oil & Gas imaging code (2014)
Irish Centre for High-End Computing
AcceleraIng soluIons on Burst-‐Buffer
Object Storage Servers
Compute nodes
OSS1 OSS2
IME1
IME2
IME3
IME4
OST1 OST2 OST6
...
IB FDR
IME Servers
SFA7700
DDN-ICHEC Whitepaper: Experimenting on IME with an Oil & Gas imaging code (2014)
I/O requests are re-ordered through a cache layer
Irish Centre for High-End Computing
AcceleraIng soluIons on Burst-‐Buffer
Object Storage Servers
Compute nodes
OSS1 OSS2
IME1
IME2
IME3
IME4
OST1 OST2 OST6
...
IB FDR
IME Servers
SFA7700
0.00
0.20
0.40
0.60
0.80
1.00
Small case 80GB Medium case
950GB Large case 8.4 TB
Up-‐to 3x speedup Total execuIon Ime In memory
Lustre IME Burst Buffer
DDN-ICHEC Whitepaper: Experimenting on IME with an Oil & Gas imaging code (2014)
I/O requests are re-ordered through a cache layer
Irish Centre for High-End Computing
AcceleraIng SoluIons on Xeon Phi • Materials under extreme condi%ons: - UK STFC code - 500k LOC - Extreme-‐scale code
• Applica%ons: - Novel Materials - Biotech
Irish Centre for High-End Computing
AcceleraIng SoluIons on Xeon Phi • Materials under extreme condi%ons: - UK STFC code - 500k LOC - Extreme-‐scale code
• Applica%ons: - Novel Materials - Biotech
• Project: 2-year R&D project (IPCC)
Irish Centre for High-End Computing
AcceleraIng SoluIons on Xeon Phi • Materials under extreme condi%ons: - UK STFC code - 500k LOC - Extreme-‐scale code
• Applica%ons: - Novel Materials - Biotech
• Project: 2-year R&D project (IPCC)
• Requirements: Extreme scalability
Irish Centre for High-End Computing
AcceleraIng SoluIons on Xeon Phi • Materials under extreme condi%ons: - UK STFC code - 500k LOC - Extreme-‐scale code
• Applica%ons: - Novel Materials - Biotech
• Project: 2-year R&D project (IPCC)
• Requirements: Extreme scalability
• Challenge: Exascale-ready – Knights Landing ready
Irish Centre for High-End Computing
AcceleraIng SoluIons on Xeon Phi
Xeon Phi Introduction Tim Cramer | Rechen- und Kommunikationszentrum
12
Xeon Phi Nodes at RWTH Aachen
Xeon (8 Cores
@ 2 GHz)
Xeon (8 Cores
@ 2 GHz)
DDR3 (16 GB)
DDR3 (16 GB)
Xeon Phi (60 Cores @ 1GHz)
Xeon Phi (60 Cores @ 1GHz)
GDDR5 (8 GB)
GDDR5 (8 GB)
Shared Memory
PCI Express
QPI
Host System
MIC System
MIC System
Compute Node
“Dynamic load-balancing for iterative algorithms”
Increasing heterogeneity in the Datacentre
6x speedup over 2 socket IvyBridge
Irish Centre for High-End Computing
AcceleraIng SoluIons on FPGA • Webscale apps: - Memcached - Kernels (NDA)
• Applica%ons: - AnalyIcs - Stream processing
Irish Centre for High-End Computing
AcceleraIng SoluIons on FPGA • Webscale apps: - Memcached - Kernels (NDA)
• Applica%ons: - AnalyIcs - Stream processing
ADM-XRC-7V3 board
Irish Centre for High-End Computing
AcceleraIng SoluIons on FPGA • Webscale apps: - Memcached - Kernels (NDA)
• Applica%ons: - AnalyIcs - Stream processing
• Project: 6-month R&D project
ADM-XRC-7V3 board
Irish Centre for High-End Computing
AcceleraIng SoluIons on FPGA • Webscale apps: - Memcached - Kernels (NDA)
• Applica%ons: - AnalyIcs - Stream processing
• Project: 6-month R&D project
• Requirements: Open Standards ADM-XRC-7V3 board
Irish Centre for High-End Computing
AcceleraIng SoluIons on FPGA • Webscale apps: - Memcached - Kernels (NDA)
• Applica%ons: - AnalyIcs - Stream processing
• Project: 6-month R&D project
• Requirements: Open Standards
• Challenge: Acceleration & Perf/Watt ADM-XRC-7V3 board
Irish Centre for High-End Computing
AcceleraIng SoluIons on FPGA • Webscale apps: - Memcached - Kernels (NDA)
• Applica%ons: - AnalyIcs - Stream processing
Blended ICHEC-Xilinx Team working at Xilinx EMEA HQ
ADM-XRC-7V3 board