Looking Ahead: A New PSU Research Cloud Architecture
Chuck Gilbert - Systems Architect and Systems Team Lead
Research CI Coordinating Committee Meeting July 31, 2014
● ITS implemented a traditional HPC infrastructure based upon:○ Fairshare model
■ Priority bump in the queues○ No guaranteed runtimes
■ Wasted research time waiting for “turn” in the system○ Segregated clusters
■ Limited re-configuration options■ No sharing of High-Speed interconnects
One model for all computing needs!
What we have
● Research Community needs are bigger!■ Guaranteed response times■ Self-service access – Needed for empowering the research community to
consume a model that works for their needs■ Root level access – Needed for enabling customized environments■ Virtualization in addition to HPC resources – flexible configurations and
online maintenance of hardware■ Accelerator cards (GPU and Phi)■ Big Data platforms■ Fast Data transfer rates
● Old ITS approach revolved around segmented, and fractured ”clusters” without flexibility and expandability
Old model can not keep pace with current and future computing needs!
What current research needs
● ICS-CI2 (High-Performance Research Cloud)○ Fundamental new approach to engineering, deploying, and managing research computing
resources○ On-Premises High-Performance Cloud allows for full customization and control of the
software and hardware stacks■ Flexible configurations■ Guaranteed run times■ Secure data storage■ High-speed network bandwidth■ Multiple possible Service Level Agreement (SLA) models■ On-demand storage purchasing capacity
○ Bursting to public clouds and national Labs (Hybrid Cloud Model)■ Compute bursting for large-scale (10k+ core) jobs■ Participation in XSEDE
○ Model used at CERN and other Research Computing Centers
● Stable computing platform○ Tested and verified software catalogs, including operating systems
■ Linux, Windows■ C, C++, Java, .NET, Scripting Languages, etc.
○ Self-service portals○ Science gateways○ Seamless maintenance○ Enable choice of consumption of resources
What is the solution?Advanced CyberInfrastructure for Innovation (ICS-CI2)
Where is ICS-CI2 in the Big Picture?
Customized Environments
………….
To be aligned at later date to conform to governance structures recommended by Research CI Governance Taskforce
ICS-CI2 Envisioned End User Experience (Future)
Penn State Research Cloud
Resource Request
ICS-CI2 (On-Premises Research Cloud)
Regional and National Labs
Public Cloud Resources
ICS-CI2 Overview
● Compute○ ICS-CI2 compute can be “re-provisioned” as needed to accommodate multiple models○ Utilizing GPU enabled, large memory, and blade servers deployed through each proposed phase○ N number of CI-Cores are built on top of converged compute, segmented by security boundaries, networks,
firewalls● Storage
○ ICS-CI2 Cloud Storage offers choice of provisioning, backup, and retention models○ ICS-CI2 Storage Automation, Metering, Metrics allow for methodical expansion based on usage and trends○ ICS-CI2 Storage scales to multiple Petabytes
● Network○ Direct integration into the PSU Research Network for fast access and data transfers○ Limited single points of failure to minimize downtime/maintenance windows
Direct integration into research network core
ICS-CI2Proposed Network / Infrastructure Plan
What we have already started to implement
The ITS interactive cluster Hammer was at a breaking point!
Hardware Issue
● 24 compute nodes○ Slow network○ Slow IO○ Inadequate Memory○ Old, outdated operating system
Operational Issue
● Software stack not unified● Memory can not support number of user
requesting resources○ Processes denied running
● Hardware near end-of-life
What we have already started to implement
ICS is installing a new interactive cluster with the following enhancements!
Hardware Specifications
● 24 compute nodes○ Dual 10 core processors○ 256 GB of RAM○ NVIDIA K4000 Graphics Card○ 10G Ethernet
● Public 10G ethernet access (10X increase)
● Research Network Ten-Gigabit ethernet access
○ Available late fall● Unified software stack with batch
clusters● Re-usable hardware platform● Interactive processing● 5X improvement on processing
power● 5X improvement on memory
* Hardware will be available for 2014 fall semester
Questions ?