© 2018 NETRONOME SYSTEMS, INC. 1
December 3 - 6, 2018
Santa Clara Convention Center
CA, USA
REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.
https://tmt.knect365.com/risc-v-summit @risc_v
© 2018 NETRONOME SYSTEMS, INC. 2
Steven Zagorianakos
VP Silicon Development
Netronome
MASSIVELY PARALLEL RISC-V PROCESSING WITH TRANSACTIONAL MEMORY
https://tmt.knect365.com/risc-v-summit @risc_v
© 2018 NETRONOME SYSTEMS, INC. 3
Introduction
• Discuss Transaction Memories
• Walk Through an Example Implementation, Utilizing Transactional Memories and
RISC-V Harts
• Full Chip, Island, Cluster and Groups of RISC-V Harts
• RISC-V Feature Set for RFPC
• Summary
© 2018 NETRONOME SYSTEMS, INC. 4
“Transactional Memory”
But still running in arbitrary
C code of any size ...
Instruction-Driven
Switch Fabric • Transactional Memory
Hierarchy
▶ Memory
▶ Closely coupled
▶ Threaded processing
engines
▶ And hardwired transaction
types
▶ Atomics
▶ CRC
▶ Crypto
• Many, Many CPU Cores
• Require
▶ Many Cores
▶ Efficient Command Dispatch /
Fetch / Result / Synchronization
• (Not interrupt based
for example…)!
▶ WFE
▶ Currently Planned
as Custom-1
© 2018 NETRONOME SYSTEMS, INC. 5
A Practical Implementation
RFPC Island
(~100 Cores)
RFPC Island
(~100 Cores)
RFPC Island
(~100 Cores)
RFPC Island
(~100 Cores)
RFPC Island
(~100 Cores)
RFPC Island
(~100 Cores)
SRAM Memory Island
RFPC Island
(~100 Cores)
SRAM Memory Island
SRAM Memory Island
DRAM-Backed Memory Island
SRAM Memory Island
Host Interface
Island DRAM Cache
Config Island
Expansion Island
Network Interface
Island
Host Memory
Host
• The chip or chiplet is made up
of islands, which are connected
through the instruction-driven
switch fabric
• Which allows for implement-
tation from small to large
• Memory hierarchy provides
equal access to all types of
memories
• The config, host interface, and
network interface islands allow
for feeding data into the system
• Basic flow of data in a
SmartNIC
© 2018 NETRONOME SYSTEMS, INC. 6
RFPC Island
RFPC Cluster (Many RFPC Cores)
RFPC Cluster (Many RFPC Cores)
RFPC Cluster (Many RFPC Cores)
Local Scratch Memory
Config/Island Bridge
Tile Link to Island Bus
Agent
Slice Cache
Global Bus
Island Bus
Transactional Memory Ops
Datapath: Posted Coprocessor and
Memory Transactions
Caching Data/ Instructions, C Memory
Structures, etc.
Island Bus Remote-Cache Coherency Ops
Tile Link
Tile Link
Tile Link
Slice Cache
Slice Cache
© 2018 NETRONOME SYSTEMS, INC. 7
RFPC Cluster
RFPC Group (~10 Cores)
Transactional Memory Ops
Tile Link Interface
Manages Binding
Local Prefetch/Write
Buffer
Island Bus interface
RFPC Group (~10 Cores)
RFPC Group (~10 Cores)
Island Bus interface
RFPC Group (~10 Cores)
Load Store
Island Bus
Caching Data/ Instructions, C Memory
Structures, etc.
Datapath: Posted Coprocessor and Memory Transactions
Tile Link
Load Store
Island Bus
Datapath: Posted Coprocessor and Memory Transactions
Remote-Cache Coherency Ops
© 2018 NETRONOME SYSTEMS, INC. 8
RFPC Group
RFPC Core
RFPC Group
Coproc (Multiply +)
Signals / Timers
RISC-V Pipeline
Several Cores
Per RFPC Group
Internal Cmd/ Atomic/
Prefetch/ Write Buffer
Transactional Memory Ops Remote-Cache Coherency Ops
Local Shared Memory
Code, High-Speed Thread-Local
Data Structures Data
Prefetch/Write Buffer
Instruction Fetch
© 2018 NETRONOME SYSTEMS, INC. 9
RISC-V Feature Set for RFPC
RFPC Cores are RV32IMC cores with custom-0/1 instructions RV32IMC keeps the performance high with low silicon gate count; support for User, Machine and Debug modes only, but
provides some memory protection and both user-level and machine-level interrupts.
Custom-0 instructions permit dynamic binding of 48+-bit host address and bulk DDR addresses to 32-bit RISC-V addresses
Custom-1 instructions permit transaction memory and signaling operations
RFPC Cores collected into RFPC groups Sharing local memory, which is directly accessed (not cache)
Simple address translation permits core-local data and stack without changing code and register initialization values
RFPC Groups collected into RFPC Clusters Transaction initiation and signal handling (for transaction acceptance/completion) are handled also in the island bus
interfaces.
Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster
only. Non-transactional access to the cache slices
RFPC Clusters collected together RISC-V Debug module shared amongst 40 cores - permits JTAG-based debugging of every core
The slices of cache combine as ‘L2’ cache
Provides windowing to 48-bit PCIe and 40-bit MU address spaces
RFPC is size and performance optimized
© 2018 NETRONOME SYSTEMS, INC. 10
Summary
• RISC-V harts are well suited for the processor required for implementing a
thousand CPU Smart-NIC.
• The RISC-V solutions can be tailored to meet the needs for embedded
applications with suitable choice of instruction set features, privileged
modes and debug methodology.
• We covered at a high level the organization of memories and RISC-V harts
that provides efficient processing with high latency memory transactions
• We looked at the instruction set customizations that allow this to handle
RISC-V hart interaction with the memory systems and other harts
© 2018 NETRONOME SYSTEMS, INC. 11
ODSA Workgroup
Implementing open specifications contributed by participating
companies, any vendor’s silicon die can become a building
block that can be utilized in a chiplet-based SoC design
Working together to standardize processors, accelerators,
and memory and I/O peripherals using optimal process nodes
Companies wishing to learn more, participate and become an integral part
of the ODSA Workgroup can inquire further at [email protected] or visit us
in booth #407!
© 2018 NETRONOME SYSTEMS, INC. 12
THANK YOU https://tmt.knect365.com/risc-v-summit
@risc_v