post-k development and introducing dlu - fujitsu › global › images › post-k... · title:...
TRANSCRIPT
Copyright 2017 FUJITSU LIMITED
Post-K Development andIntroducing DLU
0
Copyright 2017 FUJITSU LIMITED
Fujitsu’s HPC Development Timeline
© RIKEN
The K computer is still competitive in various fields; from advanced research to manufacturing.
K computer
The post-K is under development to achieve superior application performance.
Post-K Computer
HPCGNo.1
(2017)
Graph500No.1(2016)
Gordon Bell Prize Finalist
(2016)
DLU is a processor designed for deep learning that has the ability to handle large-scale neural networks.
Deep Learning Unit (DLU™)
1
Copyright 2017 FUJITSU LIMITED
Post-K Development
Post-K Development
2
Copyright 2017 FUJITSU LIMITED
Japan’s Post-K Computer Development Project
Project Overview• RIKEN and Fujitsu are currently developing the post-K computer,
which is aiming to be the most advanced general-purpose supercomputer in the world
Goals of Japan’s Post-K Development Project• Application performance
• Low power consumption
• User convenience
• Ability to produce ground-breaking results
3
Features of Post-K CPU and Interconnect
Fujitsu CPU, adopting ARM ISA and enhanced Tofu interconnect
Inheriting and enhancing the K computer’s innovative features
Copyright 2017 FUJITSU LIMITED
*Mathematical acceleration primitives include trigonometric functions, exponential functions, etc.
Functions & Architecture Post-K K computer
Processor
Base ISA + SIMD Extensions ARMv8-A+SVE SPARCv9+HPC-ACE
SIMD width [bit] 512 128
FP16 (half precision) support ✔ -
FMA: Floating-point multiply and add ✔ ✔
Math. acceleration primitives* ✔ Enhanced ✔
Inter-core barrier ✔ ✔
Sector cache ✔ Enhanced ✔
Hardware “prefetch” assist ✔ Enhanced ✔
Interconnect Tofu ✔ Enhanced ✔
4
Post-K CPU Supports FP16
Copyright 2017 FUJITSU LIMITED
Provides optimized precision for a wide range of applications• Superior performance
• Reduces required bandwidth and power consumption
Target applications• Existing numerical applications
• Brand-new applications, including deep learning
High Performancefor
More Applications
Double Precision
Single Precision
Half Precision
5
Features of Post-K System Software
Copyright 2017 FUJITSU LIMITED
System software being developed in cooperation with RIKEN
Management software designed to optimize the balance of performance and power efficiency
Improved programming environment to maximize application performance through a co-design scheme with application developers
Post-K System Hardware
Fujitsu Technical Computing Suite
Linux OS / McKernel(McKernel: Lightweight kernel developed by RIKEN)
Management Software Hierarchical File System Programming Environment
Post-K Applications
6
Copyright 2017 FUJITSU LIMITED
* Linaro is an organization that works on open-source software for the ARM
ecosystem. If you want to learn more, visit their website: <https://www.linaro.org/>
Through contributing to the ARM community, Fujitsu is combining our strengths with ARM’s to advance HPC technology.
Complies to ARM standard platform specs (e.g. SBSA, SBBR etc.) to be able to use software resources of the ARM ecosystem easily.
Cooperates with the ARM/Linux communityand Linaro* to optimize the ARM ecosystem for HPC. (e.g. Open HPC for ARM)
Co-creation with ARM Community
7
Copyright 2017 FUJITSU LIMITED
Introducing DLU
8
Copyright 2017 FUJITSU LIMITED
Features
• Architecture designed for deep learning
• Low-power consumption design
Goal: 10x Performance / Watt compared to competitors
• Scalable design with Tofu interconnect technology
Ability to handle large-scale neural networks
Utilizing technologies derived from the K computer
DLU: Processor Designed for Deep Learning
DLUDeep Learning UnitFY2018 -
9
DLU Architecture
ISA: Newly developed for deep learning
Micro-Architecture
• Simple pipeline to remove HW complexity
• On-chip network to share data between DPUs
Utilizes Fujitsu’s HPC experience, such as high density FMAs and high speed interconnect
Maximizes performance / watt
Copyright 2017 FUJITSU LIMITED
Fujitsu’s interconnect technologyLarge scale DLU interconnect through off-chip network
DLU Deep Learning Unit
Host I/F
Inter-chipI/F
HBM2
DPU-0
DPU-1
DPU
DPE DPE DPE
DPE DPE DPE
DPE DPE DPE
DPU
DPU
DPU-n
DPE DPE DPE
DPE DPE DPE
DPE DPE DPE
DPU: Deep learning Processing Unit,DPE: Deep learning Processing Element
On-chip networkNew ISA for deep learningHigh density FMA
10
DLU Roadmap
Multiple generations of DLUs over time, as we currently do for HPC/UNIX/Mainframe processors
Copyright 2017 FUJITSU LIMITED
* Subject to change without notice
Needs Host CPU Inter-DLU Direct
Connection
Embedded Host CPU Neuro Computing Combinational Optimization
Architecture
The 1st
GenerationThe 2nd
Generation Future
Performance / Watt
11
Copyright 2016 FUJITSU LIMITED