santa clara convention center ca, usa revolutionizing the ... · island bus access through a shared...

12
© 2018 NETRONOME SYSTEMS, INC. 1 December 3 - 6, 2018 Santa Clara Convention Center CA, USA REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND. https://tmt.knect365.com/risc-v-summit @risc_v

Upload: others

Post on 16-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Santa Clara Convention Center CA, USA REVOLUTIONIZING THE ... · Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster

© 2018 NETRONOME SYSTEMS, INC. 1

December 3 - 6, 2018

Santa Clara Convention Center

CA, USA

REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.

https://tmt.knect365.com/risc-v-summit @risc_v

Page 2: Santa Clara Convention Center CA, USA REVOLUTIONIZING THE ... · Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster

© 2018 NETRONOME SYSTEMS, INC. 2

Steven Zagorianakos

VP Silicon Development

Netronome

MASSIVELY PARALLEL RISC-V PROCESSING WITH TRANSACTIONAL MEMORY

https://tmt.knect365.com/risc-v-summit @risc_v

Page 3: Santa Clara Convention Center CA, USA REVOLUTIONIZING THE ... · Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster

© 2018 NETRONOME SYSTEMS, INC. 3

Introduction

• Discuss Transaction Memories

• Walk Through an Example Implementation, Utilizing Transactional Memories and

RISC-V Harts

• Full Chip, Island, Cluster and Groups of RISC-V Harts

• RISC-V Feature Set for RFPC

• Summary

Page 4: Santa Clara Convention Center CA, USA REVOLUTIONIZING THE ... · Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster

© 2018 NETRONOME SYSTEMS, INC. 4

“Transactional Memory”

But still running in arbitrary

C code of any size ...

Instruction-Driven

Switch Fabric • Transactional Memory

Hierarchy

▶ Memory

▶ Closely coupled

▶ Threaded processing

engines

▶ And hardwired transaction

types

▶ Atomics

▶ CRC

▶ Crypto

• Many, Many CPU Cores

• Require

▶ Many Cores

▶ Efficient Command Dispatch /

Fetch / Result / Synchronization

• (Not interrupt based

for example…)!

▶ WFE

▶ Currently Planned

as Custom-1

Page 5: Santa Clara Convention Center CA, USA REVOLUTIONIZING THE ... · Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster

© 2018 NETRONOME SYSTEMS, INC. 5

A Practical Implementation

RFPC Island

(~100 Cores)

RFPC Island

(~100 Cores)

RFPC Island

(~100 Cores)

RFPC Island

(~100 Cores)

RFPC Island

(~100 Cores)

RFPC Island

(~100 Cores)

SRAM Memory Island

RFPC Island

(~100 Cores)

SRAM Memory Island

SRAM Memory Island

DRAM-Backed Memory Island

SRAM Memory Island

Host Interface

Island DRAM Cache

Config Island

Expansion Island

Network Interface

Island

Host Memory

Host

• The chip or chiplet is made up

of islands, which are connected

through the instruction-driven

switch fabric

• Which allows for implement-

tation from small to large

• Memory hierarchy provides

equal access to all types of

memories

• The config, host interface, and

network interface islands allow

for feeding data into the system

• Basic flow of data in a

SmartNIC

Page 6: Santa Clara Convention Center CA, USA REVOLUTIONIZING THE ... · Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster

© 2018 NETRONOME SYSTEMS, INC. 6

RFPC Island

RFPC Cluster (Many RFPC Cores)

RFPC Cluster (Many RFPC Cores)

RFPC Cluster (Many RFPC Cores)

Local Scratch Memory

Config/Island Bridge

Tile Link to Island Bus

Agent

Slice Cache

Global Bus

Island Bus

Transactional Memory Ops

Datapath: Posted Coprocessor and

Memory Transactions

Caching Data/ Instructions, C Memory

Structures, etc.

Island Bus Remote-Cache Coherency Ops

Tile Link

Tile Link

Tile Link

Slice Cache

Slice Cache

Page 7: Santa Clara Convention Center CA, USA REVOLUTIONIZING THE ... · Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster

© 2018 NETRONOME SYSTEMS, INC. 7

RFPC Cluster

RFPC Group (~10 Cores)

Transactional Memory Ops

Tile Link Interface

Manages Binding

Local Prefetch/Write

Buffer

Island Bus interface

RFPC Group (~10 Cores)

RFPC Group (~10 Cores)

Island Bus interface

RFPC Group (~10 Cores)

Load Store

Island Bus

Caching Data/ Instructions, C Memory

Structures, etc.

Datapath: Posted Coprocessor and Memory Transactions

Tile Link

Load Store

Island Bus

Datapath: Posted Coprocessor and Memory Transactions

Remote-Cache Coherency Ops

Page 8: Santa Clara Convention Center CA, USA REVOLUTIONIZING THE ... · Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster

© 2018 NETRONOME SYSTEMS, INC. 8

RFPC Group

RFPC Core

RFPC Group

Coproc (Multiply +)

Signals / Timers

RISC-V Pipeline

Several Cores

Per RFPC Group

Internal Cmd/ Atomic/

Prefetch/ Write Buffer

Transactional Memory Ops Remote-Cache Coherency Ops

Local Shared Memory

Code, High-Speed Thread-Local

Data Structures Data

Prefetch/Write Buffer

Instruction Fetch

Page 9: Santa Clara Convention Center CA, USA REVOLUTIONIZING THE ... · Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster

© 2018 NETRONOME SYSTEMS, INC. 9

RISC-V Feature Set for RFPC

RFPC Cores are RV32IMC cores with custom-0/1 instructions RV32IMC keeps the performance high with low silicon gate count; support for User, Machine and Debug modes only, but

provides some memory protection and both user-level and machine-level interrupts.

Custom-0 instructions permit dynamic binding of 48+-bit host address and bulk DDR addresses to 32-bit RISC-V addresses

Custom-1 instructions permit transaction memory and signaling operations

RFPC Cores collected into RFPC groups Sharing local memory, which is directly accessed (not cache)

Simple address translation permits core-local data and stack without changing code and register initialization values

RFPC Groups collected into RFPC Clusters Transaction initiation and signal handling (for transaction acceptance/completion) are handled also in the island bus

interfaces.

Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster

only. Non-transactional access to the cache slices

RFPC Clusters collected together RISC-V Debug module shared amongst 40 cores - permits JTAG-based debugging of every core

The slices of cache combine as ‘L2’ cache

Provides windowing to 48-bit PCIe and 40-bit MU address spaces

RFPC is size and performance optimized

Page 10: Santa Clara Convention Center CA, USA REVOLUTIONIZING THE ... · Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster

© 2018 NETRONOME SYSTEMS, INC. 10

Summary

• RISC-V harts are well suited for the processor required for implementing a

thousand CPU Smart-NIC.

• The RISC-V solutions can be tailored to meet the needs for embedded

applications with suitable choice of instruction set features, privileged

modes and debug methodology.

• We covered at a high level the organization of memories and RISC-V harts

that provides efficient processing with high latency memory transactions

• We looked at the instruction set customizations that allow this to handle

RISC-V hart interaction with the memory systems and other harts

Page 11: Santa Clara Convention Center CA, USA REVOLUTIONIZING THE ... · Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster

© 2018 NETRONOME SYSTEMS, INC. 11

ODSA Workgroup

Implementing open specifications contributed by participating

companies, any vendor’s silicon die can become a building

block that can be utilized in a chiplet-based SoC design

Working together to standardize processors, accelerators,

and memory and I/O peripherals using optimal process nodes

Companies wishing to learn more, participate and become an integral part

of the ODSA Workgroup can inquire further at [email protected] or visit us

in booth #407!

Page 12: Santa Clara Convention Center CA, USA REVOLUTIONIZING THE ... · Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster

© 2018 NETRONOME SYSTEMS, INC. 12

THANK YOU https://tmt.knect365.com/risc-v-summit

@risc_v