update on hpc efforts in nudt - exascale– high performance scalable interconnection – balance...
TRANSCRIPT
National University of Defense Technology
Tianhe
天河
Update on HPC Efforts in NUDT
Yutong Lu
School of computer science
National University of Defense Technology
National University of Defense Technology
Tianhe
天河 Outline
l Roadmap
l Hardware
l Software
l Application
National University of Defense Technology
Tianhe
天河 Overview of TH Roadmap
10P 100P 1000P 1P
2014
2015
2020
year
Performance
2009
2010 NUDT Changsha, NSCC-CS NSCC-TJ Tianjin
National University of Defense Technology
Tianhe
天河 Roadmap l 12-FiveY project ( -2015)
– System l 100P l funding
– MOST (863) – Local government
– Software(year 2011) l NSFC 37 million for basic algorithm and computable model l MOST(863) 80 million for Domain applications
l 13-FiveY project ( -2020) – ~ 1 E – Funding ?
National University of Defense Technology
Tianhe
天河 Highlights of future TH systems
l Heterogeneous parallel architecture
l Multiple-dimension interconnection network
l Hierarchy I/O storage system
l Autonomic fault tolerant management
l Domain specific programming framework
l Adaptive power aware computing
National University of Defense Technology
Tianhe
天河
l Multiple-dimension interconnection network – Support more than 100,000 nodes – Bandwidth will be improved a lot – Enhance collective communication
System Level
Rack Level
Blade Level
Hardware
National University of Defense Technology
Tianhe
天河
l Inter-chip optical connection – Optical interface between
processors
– Optical switch between CPUs is under research
Power
High-speed interconne
ction chip
High-speed interconne
ction chip
Parallel optical coupled transfer
High-speed electric interface
Mem
ory
Mem
ory
cpu7 cpu8
cpu1
cpu2
cpu3 cpu4
cpu5
cpu6
Optical switch array
Detector
Modulator
Laser
Hardware
National University of Defense Technology
Tianhe
天河
Large-scale Hybrid Tiered File System Architecture
l Scalability to achieve >1TB/s I/O bandwidth by leveraging spatial locality l Usability by federating multi-level storage into unified name space l Flexibility by key components re-configuration for application optimization l Applicability for supercomputers and clusters with hybrid infrastructure
Software
National University of Defense Technology
Tianhe
天河 Resilience computing Framework
l Capability to support post-petaflops and Exascale computing
l Collaboration with whole system software stack
l Coherent fault detection l Coordinate fault tolerant decision l Cooperation of multiple fault
recovery mechanics l Combination of proactive and
reactive strategies l Customizable fault detection,
prediction and recovery approaches l Support various parallel models
Software
National University of Defense Technology
Tianhe
天河
Data Dependency
extract
Data Structures
Promote
Communications
Load Balancing support
Fault tolerance
Parallel Computing
Models form
separate Models Stencils
Algorithms
Special Library
Models Stencils
Algorithms
Common
Computers
Parallel programming Framework
Software
Application code
National University of Defense Technology
Tianhe
天河
Data Dependency
extract
Data Structures
Promote
Communications
Load Balancing support
Fault tolerance
Parallel Computing
Models form
separate Models Stencils
Algorithms
Special Library
Models Stencils
Algorithms
Common
Computers
Parallel programming Framework
Software
Application code
Hide the complexity of programming(mil cores/hybrid) Integrate the efficient parallel fast numerical algorithms Provide efficient data structures and solver libraries Support software engineering for code reusing USERS
National University of Defense Technology
Tianhe
天河
Data Dependency
extract
Data Structures
Promote
Communications
Load Balancing support
Fault tolerance
Parallel Computing
Models form
separate Models Stencils
Algorithms
Special Library
Models Stencils
Algorithms
Common
Computers
Parallel programming Framework
Software
Application code
Optimization interface Heterogeneous issues Communication organization Load balance Fault tolerance System software developers
National University of Defense Technology
Tianhe
天河 Applications
l Five priority areas
– Climate & Environment
– Bio-medicine
– New energy
– Equipment engineering
– Animation
National University of Defense Technology
Tianhe
天河 Summary
l Towards Next generation of TH system – Heterogeneous architecture
– New enabling technology
– High performance scalable interconnection
– Balance the computing and data accessing
– Feasible fault tolerant mechanics
– Usable domain-specific programming framework
– Selected priority application areas
National University of Defense Technology
Tianhe
天河
Thank you