seismic shift: what's next in the post-moore, post-isa era?€¦ · e., tht conceptual...
TRANSCRIPT
Seismic Shift: What’s next in the Post-Moore, Post-ISA era?
Prof. Margaret MartonosiH. T. Adams ‘35 Prof. of Computer SciencePrinceton University
Not News: End of Decades of Moore’s Law scalingLess talked about: Shift to heterogeneity & parallelism has broken how software scales in performance & cost.
f Microprocessor Trend Data
. W ~ ........
' ~~ ... -- -:------ --- ---- ----- ----. -- ·-·:- . ---- · -· ---- · -·. -·:..-~. -..,·: ·. - · - · - ---·- · - ---· - . ----·
: : r"'IIL•. : : ... •~i... : --_ j_ _ --------------. -----. ----_!_ - - - - - . - _-1_ • .... ~.----_!_ - . - - - - - . - - - - - - - - - - - . - - - - -
: : ~· ~ ·-: ... ~ 11l)•1•'----~- --- ---- ------• --t1-·· ~ :- ·· --J ------ -----------~-- -------- ------ ------- --
. ~tJ ... ~ .......... 4
~----~-~--~:~; ~ ~- + • 102 ~ ~ L~i:, :::;-, ~ T ·:T ~•\ ~f
: - : ~ ... :'Y ... • ...... 1 0 1 ·- --- ----- -. ----- -·- -- -. -:- -- -·- ---. --- ---·- --- ---- --:- ---- ----- -·- --- • - -.. -. --:- .. -. -. -.. ----- -♦ . --♦ - .• -. •♦ -. ----. ------. ----. ... ■ ■ : ... : ..
... : ... ... ... :'Y ... ... *& .., ■ T: ... ,..! •••" I
- ---♦ ·--- - ---- - -- +; -----♦-------- -- ----•+:• --... 11111• -4'H ....... ----- ---- ~-- ------- --- ----- ------- · ' ' ' ' ' '
Transistors (thousands)
Single-Thread Performance (SpeclNT x 103)
Frequency (MHz)
ical Power (Watts)
Number of Logical Cores
1970 1980 1990 2000 2010 2020 Year
Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, 0 . Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp
1964: While Moore’s Law was being sketched, “Computer Architecture” was coined…
= The value of HW/SW abstraction…
G. M. Amdahl
G. A. Blaauw
F. P. Brooks, Jr.,
Architecture of the IBM System/360
Abstrad: The archiledure • of the newly announced IBM System/360 features four Innovations:
1. An approach to storage which permits and exploits very large capacities, hierarchies of speeds, read
only storage for microprogram control, flexible storage protection, and simple program relocation.
2. An input/output system offering new degrees of concurrent operation, compatible channel operation,
data rates approaching 5,000,000 charaders/second, integrated design of hardware and software, a new
low-cost, multiple-channel package sharing main-frame hardware, new provisions for device status Infor
mation, and a standard channel interface between central processing unit and input/ output devices.
3. A truly general-purpose machine organization offering new supervisory facilities, powerful logical pro-
is used here to describe the attributes of a programmer. i .. e., tht conceptual structure and
~Tbe term a-Ychitectu,-e system as setn by the functiooal behavior, as distinct
line of six models having a per-
ionale for the main features of the
mpatibility among central process
scientific, real-time, and logical in
ha Appendices.
and controls, the logica1 design, from the organization of the data flow and the physical implementation ..
Introduction
The design philosophies of the new geoeral-purpase machine organization for the IBM System/360 are discussed in this paper.t In addition to showing the architecture• of the new family of data processing systems, we point out the various engineering problems encountered in attempts to make the system design compatible, at the program bit
,r large and small models. The compatibility was 1d not only to models of any size but also to their applications-scientific, commercial, real-time, and
term art:hittct11rt ~ used here to describe tbe attributes of a & 1een by tbe prorrammer, i.e. , tbe concf'ptual structurf' and I behavior, ;u distinct from the oraanization of the data flow rols, tbe logical design, and the physical implementation. tional details concerning the architecture, enginttring design, ~ill&', and application of the I BM System/ 360 will appear in a articles in the IBM Systems Jounal.
The section that follows describes the objectives of the new system design, i.e., that it serve as a base for new technologies and applications, that it be general-purpose, efficient, and strictly program compatible in all models. The remainder of the paper is devoted to the design problems faced, the alternatives considered, and the decisions made for data format, data and instruction codes, storage assignments, and input/ output controls.
Design obiectives
The new architecture builds upon but differs from the designs that have gradually evolved since 1950. The evolution of the computer had included, besides major technological improvements, several important systems concepts and developments: 87
JBM JOURNAL "- APRIL 1964
2019:We are hereHeterogeneity “In the Large”:• 1000’s of distinct Android
devices• IoTs even more diverse
Heterogeneity “In the Small”• Dozens of ISAs on chip +
Accelerators• Memory, Data Heterogeneity• Memory Consistency Models
https://www.anandtech.com/show/9330/exynos-7420-deep-dive/2
M-Comp Slim~~S sss SMDMA 0 0 a 6• SPI TMl.l ,.; .... ;! e:l <ti
,. E ~ ~ 9:a H~12C MCT A57 A57 I I .,, s..:: ~ :::l :::l
I •• UART WTG
A53 A53 ,2.
Unknown PCM
0 A57 A57 0 A53 A53 SPOlf
~ u PWM u
~ .... 256KB L7
,ANANDlieCH ,.,
A5 ~ w IJJ ~
2MB L2 IExynos '7420 Au<lo :-- --II
Abstracted IF,loor Pl■n
BUSO i0o&72015Aoikd1ftil!l!jjsal!\f SRN,1
BUSl II
...L. t •• -r JPEG G2D - MFC
' Shader -Seal~, Scalf'r
Core N ~ .... a:) a:)
AS Scaler l,,: :.;:: r-- ,-. N N
l~P .... ... 0 BNS 3AA U"I "I .... ~
PMU Shader Shader ~ IJJ IJJ ~ Core Core ~ --- IISP ,,t- 1Cameras ... {taratily u1tla!ownJ
3• t..ter"• StrtJDf Shader Shader Shader
"'"' J• •ZC WlG Core Core Core
I :. 5#1 11 LJART •• cs, CM3
I I • • • Shader Shader 0 0 J.Cl'i'• MONie u:~ u u Uni Pro
D!SPO o_
IOlSPl o_ Core Core 3~ ~ ~ Uf'.) 2.0 - ~ - "- ~ ~ >
Ml'1 OSl,iOfl M1,1os.1,'[)P
Seismic Shift: Entering a
Post-ISA World
ISAs still useful, but little/no relevance as abstraction layer
• Apple A8 and beyond: >50% of chip area is accelerators that have no ISA.
• NVIDIA PTX vs. SASS: ISA hidden under other layers.
Questions for the Post-Moore/Post-ISA Future:• How to program these highly heterogeneous
systems? How to manage the complexity of fast-changing hardware and software?
• How to verify them?• And what technologies come next?
This Talk: A whirlwind
tour of…
Questions for the Post-Moore/Post-ISA Future:• How to program these highly heterogeneous
systems? How to manage the complexity of fast-changing hardware and software?• How to verify them?• And what technologies come next?
This Talk: A whirlwind
tour of…
Questions for the Post-Moore/Post-ISA Future:• How to program these highly heterogeneous
systems? How to manage the complexity of fast-changing hardware and software?• How to verify them?• And what technologies come next?
DECADES: A VERTICALLY-INTEGRATED APPROACH
• Enhance data locality• Optimize spatial mapping of threads • Enable in-memory computing
Language and Compiler Support
• Coarser than CGRA → VCGRTA• 3 classes of reconfigurable tiles• Reconfigurable interconnection network• Reconfigurable in-memory computing
Very Coarse-Grained Reconfigurable
Tile-Based Architecture
• Scalable full-system simulation• Multi-FPGA emulation infrastructure• 225-tile DECADES chip prototype
Multi-Tiered Demonstration Strategy
DECADES PLATFORM ARCHITECTURE
Program
Executable
Address-space
mapping
Initial Set of
In-Memory Operations
Initial Task
Mapping
DECADES Tile
Configuration
Interconnect Bridges
Configuration
Data Migration
(Across DDR nodes)
Update In-Memory
Operations
Task Migration
(across Tiles)
DECADE Tile
Reconfiguration
Bridge Ports
Re-Routing
Source Code
Compile-Time Optimizations
(Graphicionado, DeSC)
Run-Time
Self-Tuning
Data Segments
Definition
DECADES
Accelerator
Tile
DECADES
Core
Tile
DECADES
Intelligent
Storage
DECADES
Intelligent
Storage
DECADES
Intelligent
Storage
DECADES
Intelligent
Storage
DECADES
chip
DECADES
Accelerator
Tile
DECADES
Accelerator
Tile
DECADES
Accelerator
Tile
DECADES
Accelerator
Tile
DECADES
Accelerator
Tile
DECADES
Core
Tile
DECADES
Core
Tile
DECADES
Core
Tile
DECADES
Core
Tile
DECADES
Core
Tile
L1 Cache
Configurable
Interconnect
Shims
Configurable Core Pipeline
Data Supply / Compute Threads L1 Cache
Per-Tile Configurable
On-chip Memory
DECADES Monitor and Run-Time Reconfiguration Shim
DECADES Core Tile
Configurable
Interconnect
Shims
Specialized, Configurable
Data Supply / Compute
Accelerator
Per-Tile Configurable
On-chip Memory
DECADES Monitor and Run-Time Reconfiguration Shim
DECADES Accelerator Tile
Configurable
Interconnect
Shims
Near-Memory
Computation
Across-Chip Configurable
On-Chip Memory
DECADES Monitor and Run-Time Reconfiguration Shim
DECADES Intelligent Storage
Configurable Pattern-Based
Prefetcher
Run-Time
Self-Tuning
Run-Time
Self-Tuning
Run-Time
Self-Tuning
DECADES TA2 DECADES TA1
FPGAs: O↵-Chip Memory System
with In-Memory Computation
FPGAs: O↵-Chip Memory System
with In-Memory Computation
Heterogeneity meets coarse reconfigurability
DECADES Core and Accelerator tiles• Computations mapped onto
core tiles or available accelerator tiles• Each tile is wrapped in
monitor/reconfiguration shim
• Dynamic reconfiguration of Supply-Compute decoupling, power-performance tradeoffs, and interconnect
Program
Executable
Address-space
mapping
Initial Set of
In-Memory Operations
Initial Task
Mapping
DECADES Tile
Configuration
Interconnect Bridges
Configuration
Data Migration
(Across DDR nodes)
Update In-Memory
Operations
Task Migration
(across Tiles)
DECADE Tile
Reconfiguration
Bridge Ports
Re-Routing
Source Code
Compile-Time Optimizations
(Graphicionado, DeSC)
Run-Time
Self-Tuning
Data Segments
Definition
DECADES
Accelerator
Tile
DECADES
Core
Tile
DECADES
Intelligent
Storage
DECADES
Intelligent
Storage
DECADES
Intelligent
Storage
DECADES
Intelligent
Storage
DECADES
chip
DECADES
Accelerator
Tile
DECADES
Accelerator
Tile
DECADES
Accelerator
Tile
DECADES
Accelerator
Tile
DECADES
Accelerator
Tile
DECADES
Core
Tile
DECADES
Core
Tile
DECADES
Core
Tile
DECADES
Core
Tile
DECADES
Core
Tile
L1 Cache
Configurable
Interconnect
Shims
Configurable Core Pipeline
Data Supply / Compute Threads L1 Cache
Per-Tile Configurable
On-chip Memory
DECADES Monitor and Run-Time Reconfiguration Shim
DECADES Core Tile
Configurable
Interconnect
Shims
Specialized, Configurable
Data Supply / Compute
Accelerator
Per-Tile Configurable
On-chip Memory
DECADES Monitor and Run-Time Reconfiguration Shim
DECADES Accelerator Tile
Configurable
Interconnect
Shims
Near-Memory
Computation
Across-Chip Configurable
On-Chip Memory
DECADES Monitor and Run-Time Reconfiguration Shim
DECADES Intelligent Storage
Configurable Pattern-Based
Prefetcher
Run-Time
Self-Tuning
Run-Time
Self-Tuning
Run-Time
Self-Tuning
DECADES TA2 DECADES TA1
FPGAs: O↵-Chip Memory System
with In-Memory Computation
FPGAs: O↵-Chip Memory System
with In-Memory Computation
.J '--
' ,
.J '--> '
,
I I ) " " ( I I .J '--
' ,
.J "'-.
.J '--'
, '
, ) " ) '-
" ( " (
.J """ '
, ,.J
' :◄ I'
,
I 11 I ) " I I
I I
" ( I 11 I I I ,.J "'-.
' ,
.J '--,.J "'-.
' ,
' ,
+ +
DECADES Storage Specialization• Specialization #1:
Map apps onto mix of compute tiles and intelligent storage (IS) tiles• Specialization #2:
Select and configure appropriate storage features within IS• Configurable memory
banks + address and prefetching features
• Simple near-SRAM ALU
DECADES Intelligent Storage Tile
ConfigurableInterconnect
Shim
FIFO Head/Tail Pointer Array
Intelligent DMA Engine
Cache Pipeline
Pattern-Based Prefetcher
Configurable Bank Interconnect
Performance Monitoring Registers
Multi-grainAssociativeTag Array
ALU for near SRAM
processing
Configurable Memory Banks
Improving Latency-BoundApplications with Decoupled Execution
• Roots in seminal 1982 Smith DAE paper: Separately execute memory accesses (supply) from instructions that compute with them (compute)• Then: Latency tolerance “simpler” than out-
of-order execution• Now: Fits well with accelerator-oriented
design• Separate memory supply from accelerator • Orthogonal to DOALL parallelism and
bandwidth optimizations• Automatically identify and slice at compile
time
mo Y
w r e i
AQ
• • • ' da
Access Processor
gist r file
F g. 1.
ions
-·ns r c ions
AQ
AEBQ
E ecu e P oc s or
register fil
1 0 E Archi
DECADES Decoupled Supply-Compute Parallelism:Automatic Compilation + HW support
2 in-order CPUs Outperform 1 large out-of-order CPUs at 3X less power, area
Metric Improvement
Performance 10%
Area 33%
VI -c "' 0 ...J
<(
0 QI Cl
2 C QI
l:: QI c..
100
90
80
70
60
50
40
30
20
10
0 Kron Uni SW
BFS
Load Characterization
Kron Uni SW
SSSP
Kron Uni SW
PR Kron Uni SW
EWSD
Pow Exp Uni
GP
- Normal Loads = LLAMA (Terminal Load) [=:J Remaining Terminal Loads
c:::J Supply Exclusive Loads c:::::::::J Compute Exclusive Loads
DECADES Reconfigurable Parallelism• Coarse-grained reconfigurability• Workflow Sinkhorn contains a sequence of two
kernel calls:• Sparse Elementwise• Dense Matrix Multiplication
• Each benefit from different system configurations for a finite number of resources• Sparse benefits from decoupled supply/compute• Dense benefits from traditional parallelism
• Decades architecture can be reconfigured to either use tiles as suppliers, or all tiles as compute
Example: 6-tiled system configs
C C
C C C
C
All Compute
C C C
S S S
Half Suppliers
Application All Compute Half Suppliers
Sparse 5x 7x
Dense 6x 5x
Speedups
□□□ □□□
□□□ □□□
This Talk: A whirlwind
tour of…
Questions for the Post-Moore/Post-ISA Future:• How to program these highly heterogeneous
systems? How to manage the complexity of fast-changing hardware and software?• How to verify them?• And what technologies come next?
Post-ISA Hardware DesignSOCs comprised of many CPUs, GPUs, and accelerators from many vendors-> How to verify that a given block does what it is specified to do?-> How to formally specify what blocks should do?
-> Formal Interface Specifications are the new ISA!
https://www.anandtech.com/show/9330/exynos-7420-deep-dive/2
M-Comp <,lim<,<,<, ..,..,.., ',MOMA 0 ~ "' 6• SPI TMJ .., "'
:: ,, u . ~ in al ~ = -:: ~ 9• ►tS1.Zl M(I
A57 A57 I I "' "' s .~ ~ ) :::)
I •• UART W'G
A53 A53 ,i, PlM
0 Unknown A57 A57 0 A53 A53 SPDIF
~ u PWM
u ~ - 256KBL2
ANANDTECH ..,
A5 ~ UJ w ~
2MB L2 Exynos 7420 "~"'° > - -II
Abstracted F,loor Pl■n
BUSO Go6(2015 Andrtl ftulllUSaftU SR,1/,I
BUSl II
....L. I II T .... JP[G GW - MR ...... ShadPr -xaler xaler
Core N N ~
a:) a:)
',cal er :,,,:: :.;:: A5 ,... .....
N N ISP .... ...
0 BNS 3AA U"I .ti "' ~ PMS Shader Shader ~ w w > Core Core > - IISP + Cameras - (largely unknown}
3• l•ter"• 5,eri,or Shader Shader Shader rwM 3, ;:c WT::; Core Core Core .~ ;:,_ ~Pl 11 .;ART 41 (SI
lMJ
:x:
I I I
I I Shader Shader 0 0 ,.-:-\. MCNl.ti u.. ~ u u Uni Pro
OISPO Q_
DtSPl Q_ Core Core ~:;; ~ ~ UfS 2.0 "'- Q. ~-c Q.
~ ~ > > \wlllPI ;)-51,,DP , .. P1 :,-.S.1;:)P
The Check Suite: An Ecosystem of Tools For Verifying Memory Orderings and their Security Implications
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
RTL (e.g. Verilog)
PipeCheck [Micro ‘14] [IEEE MICRO Top Picks]
TriCheck [ASPLOS ‘17] [IEEE MICRO Top Picks]
CCICheck [Micro ‘15] [Nominated for Best Paper Award]
COATCheck [ASPLOS ‘16] [IEEE MICRO Top Picks]
RTLCheck [Micro ‘17] [IEEE MICRO Top Picks Honorable Mention]
Our Approach• Axiomatic specifications -> Happens-before graphs• Check Happens-Before Graphs via Efficient SMT solvers • Cyclic => A->B->C->A… Can’t happen• Acyclic => Scenario is observable
A
CB
CheckMate[Micro ‘18][IEEE Micro Top Picks]
PipeProof[Micro ‘18][Best Paper Nominee.IEEE Micro Top PicksHonorable Mention]
For more info: check.cs.Princeton.edu
c;
Check: Formal, Axiomatic Models and Interfaces
Coherence Protocol (SWMR, DVI, etc.)
Lds.
L2WB
Mem.
SB
L1Exec.
Dec.
Fetch
WB
Mem.
SB
L1Exec.
Dec.
FetchAxiom "PO_Fetch":forall microops "i1",forall microops "i2",SameCore i1 i2 /\ ProgramOrder i1 i2 =>
AddEdge ((i1, Fetch), (i2, Fetch), "PO").
Axiom "Execute_stage_is_in_order":forall microops "i1",forall microops "i2",SameCore i1 i2 /\EdgeExists ((i1, Fetch), (i2, Fetch)) =>AddEdge ((i1, Execute), (i2, Execute), "PPO").
Microarchitecture Specification in μSpec DSL
Microarchitectural happens-before (µhb) graphs
Exhausti
ve co
nsideratio
n of
all possi
ble execu
tions
TriCheck Framework: Verifying Memory Event Ordering from Languages to Hardware
HLL Mem Model Sim
ISAMem Model
uArchMem Model
Obs. Not obs
Permit ok Overstrict
Forbid Bug ok
High-level LangLitmus tests
HLL->ISA Compiler Mappings
ISA-levelLitmus tests Observable/
Unobservable
Permitted/Forbidden
Compare Outcomes
.... ~
1 r
'
,. 1 r
'
,Ir 1 r
'
.... ~
RISC-V Case Study• Apply TriCheck to 7 legal RISC-V implementations: • All abide by current RISC-V spec• Vary in preserved program order and store atomicity
• Results:• Impossible to compile C11 for RISC-V as specified. • Insufficiently strong fence instructions, load-load
reordering, …• Out of 1,701 tested C11 programs:• RISC-V-Base-compliant design allows 144 buggy outcomes• RISC-V-Base+A-compliant design allows 221 buggy
outcomes
Takeaway: Draft RISC-V spec could not serve as a legal C11 compiler target
Next Steps: RISC-V Memory Model Working Group formed to address these issues. Ratified a new and formally specified RISC-V memory model that supports C11, Linux, etc.
From Memory Consistency Models to Security: What we would like…
Formal interface and specification of given system implementation
1. Specify system to study
E.g. Subtle event sequences during program’s execution
2. Specify attack pattern
Either output synthesized attacks. Or determine that none are possible3. Synthesis
The CheckMate Tool:Automated Attack Discovery & Synthesis
Axiomatic specifications similar to Check tools
1. Specify system to study
Event sequences as graph snippets2. Specify attack pattern
Relational Model Finding (RMF) approaches3. Synthesis
• What we did: Developed a tool to do this, based on the uHB graphs from previous sections.
• Results: Automatically synthesized Spectre and Meltdown, as well as two new distinct exploits and many variants.
[Trippel, Lustig, Martonosi. https://arxiv.org/abs/1802.03802][Trippel, Lustig, Martonosi. MICRO 2018. October, 2018] http://check.cs.princeton.edu/papers/ctrippel_MICRO51.pdf
Microarchitecture-Aware Security Verification
Execute
Store Buffer
L1 ViCL Create
μrf
μrf
Microarchitecture
μhb Pattern
Load being sourced from the
store buffer
Enum
erate
all
possi
ble ex
ecut
ion
grap
hs w
ith pa
ttern
#cores = 1#threads = 1#instr ≤ 2
ExecutionConstraints
Fetch
Execute
Commit
Store Buffer
L1 ViCL Create
L1 ViCL Expire
Main Memory
Complete
W [x]à1 R [x]àr0
μrf
Core 0
μrf
μhb Graph
CheckMate
Axiom "PO_Fetch":forall microops "i1",forall microops "i2",SameCore i1 i2 /\ ProgramOrder i1 i2 =>
AddEdge ((i1, Fetch), (i2, Fetch), "PO").
Axiom "Execute_stage_is_in_order":forall microops "i1",forall microops "i2",SameCore i1 i2 /\EdgeExists ((i1, Fetch), (i2, Fetch)) =>
AddEdge ((i1, Execute), (i2, Execute), "PPO").
+
+
Overall Results: What exploits get synthesized?And how long does it take?
24
The Check Suite: Bug Finds and Other Key Takeaways
So far, tools have found bugs in:• Widely-used Research simulator• Cache coherence paper• In-design commercial processors• RISC-V ISA specification• Compiler mapping proofs• IBM XL C++ compiler (fixed in v13.1.5)• C++ 11 mem model
+ Spectre Prime, MeltdownPrime, and other Vulnerabilities automatically synthesized
Key Takeaways• Need formal, well-
specified interfaces• From well-specified
interfaces-> Interaction models and analysis
• From well-specified interfaces -> Reliability and performance metrics
This Talk: A whirlwind
tour of…
Questions for the Post-Moore/Post-ISA Future:• How to program these highly heterogeneous
systems? How to manage the complexity of fast-changing hardware and software?• How to verify them?• And what technologies come next?
QC Algorithms to Machines Gap: Next Ten Years = NISQ Era
Grovers Algorithm (Database search)
Shor’s Factoring Alg. (Crypto)
Gap!
Quantum Sim, Q Chem, QAOA
#Qubits
1
10
100
10000
100000
1000000
1000
Year1995 2005 20252015
• Noisy Intermediate-Scale Quantum (NISQ)• Preskill, Jan 2018• 10-1000 qubits
• Too small for known algorithms with exponential speedup
• Too small for ECC
• Large enough to support interesting experiments!
NISQ
1' J, ,,
QC Algorithms to Machines Gap: Opportunity
QC programming and design tools that shrink the gap can move the feasibility point years sooner!• Reduce algorithm
qubit requirements• Improve effectiveness
of hardware qubits
Grovers Algorithm (Database search)
Shor’s Factoring Alg. (Crypto)
Gap!
Quantum Sim, Q Chem, QAOA
#Qubits
1
10
100
10000
100000
1000000
1000
Year1995 2005 20252015
MoreWork
Needed!
NISQ
-I / " J
• 1' I J. _I }~~-~ - -
,..
~
\ I y
An Architect’s Comparative Study of NISQ Machines
Architectural choices and design questions• Gate sets, Connectivity, Noise properties of qubits
In-depth exploration• Real-systems runs across 7 NISQ systems (3 vendors)• Enabled by our multi-vendor compiler toolflow: fair and accurate
comparisons avoiding inefficiencies from vendor compilers
Design insights from our study• Gate set choices• Connectivity choices• NISQ execution stack
[Murali et al. International Symposium on Computer Architecture. June, 2019. https://arxiv.org/abs/1905.11349 ]
Large Variations in Machine Characteristics
• Optimized compilation for qubit communication is not sufficient!• Large spatial and
temporal variations in gate error rates!
0.35 .......-------------------~
0.30
2 0.25 ro s....
0 0.20 s.... s.... QJ
QJ 0.15 .µ ro
(.!J 0.10
0.05
0 5 10 15 Day
• CNOT 5,4
◄ CNOT 7,10
CNOT 3,14
20 25
Large Variations in Machine CharacteristicsPart 2
• Large spatial and temporal variations in Qubit Coherence Times• How long they hold their state
120 • QO
◄ Q4 ► Q9
100
-V') ::::, - 80 Q)
E +J 60 N r-
40
20
0 5 10 15 20 25 Day
Machine Qubit Topology
IBM Q5 Tenerife
IBM Q14 Melbourne
IBM Q16 Rüschlikon
RigettiAgave
RigettiAspen1
RigettiAspen3
UMD Trapped Ion (UMDTI)
Name Qubits Gates
BV4 4 12
BV6 6 12
BV8 8 18
HS2 2 16
HS4 4 28
HS6 6 42
Toffoli 3 18
Fredkin 3 19
Or 3 17
Peres 3 16
QFT 2 13
Adder 4 23
ProgramsMachines
TriQCompiler
Putting it all together…
LO
l'.t:: "" 0. ""
- m Q5 IBM . 14 18 •Q}fi - r e 1i ~ - Ri ettikipe 3 - UIMDTI
B'V6 B'V HS2 6 Toffo"lm Or Peres Benchmarks
l.lU::33 r t for 12 n hn ar , u . · o , ·. l!J~~::;;~ dr ti. · l,.r c d appl cation-m eb.i ropol 01. kb., Beu.ehn11ar: . . tlh
" . R [ , 1 tb z ro h igb . bar e r p nd bt C: d l!J utpqt . f' , conipar on · i it.ended to " .liler'. tarid the wq,,act of ,arcliitecJural · ,.,, choice rJch as .rlfle •
,be,u:lunar, _perJorimu.a:c;,,e and i not in.te.nde ,to _pie a · innin ,. te li,iolo , . rpemJor or imple,men:tatio.,i. I ndirv ual. bench11UlT _perJornia,ice ,Hun,ber nui-,, ci , o er nme ......... ,ti .e , ea ur ments,. _pr,e 11t a llllfJ hot of'the_p 1'/o.r,,nanc of ti, e · tenis i 1en performe the , . peril. nts.
OS
High-Level Languages
Compiler
Architecture
VLSI Circuits
Semiconductor transistors
Algorithms Algorithms
Qubit implementations
High-level QC Languages.Compilers. Debugging.
Optimization.Error Correcting Codes
Orchestrate classical gate control,
Orchestrate qubit motion and manipulation.
~1950’s Classical Computing
Vacuum Tubes, Relay Circuits
Assembly Language
Algorithms
Quantum Systems Today: An Analogy
Quantum Toolflows
Modular hardware blocks: Gates, registers
Today’s Classical Computing
OS
High-Level Languages
Compiler
Architecture
VLSI Circuits
Semiconductor transistors
Algorithms Algorithms
implementations
Post-ISA Systems: Automated full-Stack compilation via APIs and formally-spec’d interfaces
Post-ISA Toolflows
Modular hardware blocks: Gates, registers
High-Level Languages
Classical Layering
App-specific Approaches
App-specific Approaches
Examples:QC Toolflows
TensorFlow and TPUs
…
Potential for a Seismic Shift in Computer Systems Design
Thanks to:Students and Co-AuthorsFunding: NSF, DARPA, SRC CFAR, SRC ADA.
For more info:http://check.cs.Princeton.eduhttp://mrmgroup.cs.Princeton.edu
Quantum Resources:CRA CCC Workshop Report “Next Steps in QC: Computer Science’s Role”https://arxiv.org/abs/1903.10541
National Academies Report “Quantum Computing Progress and Prospects”https://www.nap.edu/catalog/25196/quantum-computing-progress-and-prospects
"Any opinions, findings, conclusions or recommendations
expressed in this material are those of the author(s) and do not
necessarily reflect the views of the Networking and Information
Technology Research and Development Program."
The Networking and Information Technology Research and Development
(NITRD) Program
Mailing Address: NCO/NITRD, 2415 Eisenhower Avenue, Alexandria, VA 22314
Physical Address: 490 L'Enfant Plaza SW, Suite 8001, Washington, DC 20024, USA Tel: 202-459-9674,
Fax: 202-459-9673, Email: [email protected], Website: https://www.nitrd.gov
(it,NITRD -