using power at warwick university
TRANSCRIPT
Using POWER at Warwick UniversityDugan Witherick
8th July 2019 / University of Birmingham/ Second PowerAI User Group Meeting
• Established in the 1960s.
• Creation supported by University of Birmingham Vice Chancellor.
• ~27,000 students (undergrad and postgrad)
• Ranked
• 9th in the UK (Guardian 2020 league table)
• 62nd in the world (QS World University Rankings 2020)
• within the UK top 10 for highest earnings in over 11 subjects 5 years after graduating (UK Gov 2018 LEO Dataset).
Who is Warwick University
Where is Warwick University (not in Warwick)
• One of the UK's leading research universities.
• Theme focused research e.g.:
• The Engineered World: from Molecules to Machines
• Life Sciences and Health• Strong history of collaboration and partnerships
including:
• The Monash Warwick Alliance
• National Automotive Innovation Centre (WMG, JLR, Tata)
Research at Warwick University
• World-class technologies and expertise.
• Ready access to research critical tools.
• Responsibility of Pro-Vice Chancellor (Research).
• Includes:
• Advanced Bioimaging
• Electron Microscopy
• X-ray diffraction
• And...
Where do we fit in?
Research Technology Platforms
• Located in the Department of Computer Science.
• Providing:
• Scientific desktop (based on Linux).
• Two local HPC clusters providing ~6000 cores (Tinis and Orac).
• Access to the HPC Midlands+ Tier2 (Athena) system.
• One SCRTP Director.
• Four "computing" staff.
• Two dedicated RSEs with additional Project Associate RSEs.
Scientific Computing (SCRTP)
• Centre for Scientific Computing (CSC)
• Interdisciplinary research community based around the sharing of knowledge and expertise in computer modelling and simulation.
• Representatives from Departments including Physics, Maths, WMG and Warwick Medical School
• Department of Computer Science
Close Working Relationships
• UK's national institute for data science and artificial intelligence.
• University of Warwick one of the five founding partners.
• Over twenty Turing Fellows at Warwick.
• Data science tools for high-performance computing (ATI Research Project).
The Alan Turing Institute and Warwick
• Tissue Image Analytics (TIA) Lab
• Deep Learning for Imaging Data.
• Using/developing ML algorithms and data science platforms to understand and improve air quality over London.
• Crowd blackspot intelligence for 5G rollout (COCKPIT-5G).
AI at Warwick
• 4 x 16 core Haswell with 2 x Xeon Phi 7120P (co-processor)
• 4 x 16 core Haswell with 2 x K80 dual-GPU
• CentOS 6, QDR Infiniband
• 4 x Xeon Phi 7250F (Knights Landing)• CentOS 7, Omnipath
Accelerators for supporting AI workloads
• Power8 Minsky S822LC
• 2 x IBM POWER8 3.259 GHz 8-core processors
• 16 cores per node
• 256 GB DDR4 memory
• 4 x NVIDIA P100 GPGPUs (SXM2 NVLink-enabled)
• Part of the Orac HPC Cluster (Broadwell, NetApp/Spectrum Scale, Omni-Path, CentOS 7, xCAT)
OpenPower TestBed
• Version 1.5.4
• OS reinstall number one: CentOS 7.3 -> 7.5
• Not the simplest installation procedure.
• Non-relocatable dependencies!
• I think my "simplified" instructions to users may have put them off!
• User Question: "Where's theano?"
PowerAI Attempt 1
• Version 1.6.0
• OS reinstall number two: CentOS 7.5 -> 7.6
• Much simpler installation
• All in a Conda channel (thank you).
• Some actual usage!
PowerAI Attempt 2
• Demetris Marnerides
• Warwick Centre for Predictive Modelling
• Converting Low Dynamic Range (LDR) images to High Dynamic Range
• HDR displays readily available but most content still LDR.
Deep Learning for HDR Imaging
• Convolutional Neural Networks to learn mapping from LDR to HDR
• PyTorch and OpenCV
• https://github.com/dmarnerides/hdr-expandnet
• https://arxiv.org/abs/1803.02266(Initial Reseach)
ExpandNet
• Kieran Kalair
• Mathematics for Real-World Systems Centre for Doctoral Training
• Analysing traffic data particularly extreme cases:
• Accident
• Breakdown
• Random perturbation that causes a cascade of flow breakdown
Large Scale Traffic Data Analysis Problems
• Time-Series models poor predictors for very short horizons.
• Neural Networks to improve predictions using UK motorway data.
• PyTorch
Improving Predictions
Image Credit: Jaroslaw Kilian / Shutterstock.com
• Version 1.6.1.
• Sorry, Watson Machine Learning Community Edition.
• Please stop renaming your products!
• On my todo list.
PowerAI Attempt 3
• How easy is it to build HPC applications currently used on Orac and Tinis on OpenPower?
• Affects support load (manual/adhoc builds take time).
• Currently use EasyBuild.
• Once built, do they produce the expected output?
• How well do these builds perform?
Migrating Non-AI Work to OpenPower
• Use EasyBuild to build GPU accelerated HPC applications (commonly used at Warwick).
• EasyBuild fosscuda 2018b toolchain:
• GCC 7.3.0, CUDA 9.2, OpenMPI 3.1.1
• Test "identical" build on Haswell/K80 for baseline performance.
Testing Non-AI Support and Workloads
• Large-scale Atomic/Molecular Massively Parallel Simulator.
• Classical Molecular Dynamics Code.
• Distributed by Sandia National Laboratories.
• Used by Warwick Physics.
LAMMPS
• “Freeze” internal benchmark:
• 50 x 50 x 50 crystal in lattice units.
• Lennard-Jones Interactions.
• 100,000 steps.
• Patch Release 18 June 2019.
• GPU Package built with DOUBLE_DOUBLE precision.
• Built entirely by EasyBuild (no noticeable issues).
LAMMPS Testing
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
CPU
CPU+GPU
Time-steps per second (higher is better)
LAMMPS Freeze Test
Power8 K80
•Still using EasyBuild fosscuda 2018b toolchain.
•Manual build (toolchain loaded but LAMMPS build manually).
LAMMPS Testing Attempt 2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Manual
EasyBuild
Time-steps per second (higher is better)
LAMMPS Freeze GPU Test (EasyBuild vs Manual Build)
Power8 K80
• Empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM).
• Developed at MRC Laboratory of Molecular Biology (Cambridge).
• Used by Warwick Life Sciences.
• Class3D standard benchmark using v. 3.0.6.
• Built entirely by EasyBuild.
RELION
0:00 1:00 2:00 3:00
K80
Power8
Time HH:MM (Lower is better)
RELION Class3D Test
• Increase in "manual" or ad-hoc builds to get performance for some applications.
• Partial EasyBuild.
• Using toolchain but final build by hand.
• "Ad-hoc" builds using XL Compilers.
Non-AI Testing (conclusions so far)
Email: [email protected]
Questions?