integrating gen-z with risc-v

16
Integrating Gen-Z with RISC-V Mohan Parthasarathy July 19 th 2018

Upload: others

Post on 30-Dec-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrating Gen-Z with RISC-V

Integrating Gen-Z with RISC-V

Mohan Parthasarathy

July 19th 2018

Page 2: Integrating Gen-Z with RISC-V

The Motivation for a Gen-Z FabricCompute-Memory Balance is Degrading

222

0

1

2

3

4

5

6

7

8

2012 2013 2014 2015 2016 2017 2018 2019

Normalized Properties of Typical Server Processors

Cores Pins DDR Channels PCIe Lanes

>4000 pins8 DDR Channels64+ PCIe Lanes

64+ Cores

0

1

2

3

4

5

6

7

2012 2013 2014 2015 2016 2017 2018 2019

Memory & I/O Bandwidth Capacity per Core (GB/s)

DRAM Bandwidth/Core PCIe Bandwidth/Core

6 DDR Channels gave a bump in 2017, but core count growth offsets 8 DDR channels

Processor memory and I/O technologies … … are being stretched to their limits

x

x

x

x

x

x

x

x

Page 3: Integrating Gen-Z with RISC-V

The Motivation for a Gen-Z FabricMemory and Storage are Converging

3

With memory/storage convergence, memory semantic operations

become predominant (volatile & non-

volatile)TimeCurrent Time

Byte-addressable persistent memory delivered using DRAM + Flash or SCM

Page 4: Integrating Gen-Z with RISC-V

The Solution : Scalable Memory Fabric – Gen-Z!Gen-Z speaks the language of compute• High Performance

▪ Very high bandwidth (16 GT/s to 112 GT/s signaling), low latency

▪ Delivers 32 GB/s to 400+ GB/s per memory module

• Reliable▪ Flattens memory / storage hierarchy w/integrated resiliency,

multipath, aggregation, etc.

▪ No stranded resources or single-point-of-failures

• Secure▪ Provides strong hardware-enforced isolation and security

• Flexible▪ Multiple topologies, component types, etc.

▪ Supports legacy and new high-capacity form factors. Multiple media types can be physically co-located.

▪ Scales from co-packaged to single motherboard to rack-scale

• Compatible▪ Use existing physical layers, unmodified OS support

• Futuristic▪ Breaks processor-memory interlock enabling innovative solutions.

▪ Built from the “ground up” to support persistent memory semantics

Page 5: Integrating Gen-Z with RISC-V

So What’s an Example of How Gen-Z Helps Us?

WWAS 2018 | HPE & Partner Confidential 5

SoC

Me

mo

ry

Co

ntr

olle

r

DRAM

DRAM

DRAM

DRAM

Memory Bus

Memory Module

Gen

-Z L

ogi

c

Gen-Z MemorySemantic Fabric

Me

dia

Co

ntr

olle

rG

en-Z

Lo

gic

Traditional Memory Bus

• Media specific logic integrated into SoC

• Tight coupling of SoC and memory technology evolution

• Limits the types of memory that can be supported

Gen-Z Memory Interconnect

• Media specific logic integrated into memory module

• Independent SoC and memory technology evolution

• Accelerates innovation, enables variety of media support

Traditional

• SoC & memory are in generational lock-step

• Limited memory types

Gen-Z

• SoC & memory evolve freely

• Type & generation independent

Page 6: Integrating Gen-Z with RISC-V

Gen-Z + DDR : High Bandwidth

6

Gen-Z Signaling Rate Gen-Z 8 DDR 6400 Channels Aggregate Memory

Application Bandwidth

25 GT/s 64 Tx / Rx Lanes 320 GB/s 400 GB/s 720 GB/s

25 GT/s 128 Tx / Rx Lanes 640 GB/s 400 GB/s 1.04 TB/s

32 GT/s 64 Tx / Rx Lanes 400 GB/s 400 GB/s 800 GB/s

32 GT/s 128 Tx / Rx Lanes 800 GB/s 400 GB/s 1.2 TB/s

56 GT/s 64 Tx / Rx Lanes 700 GB/s 400 GB/s 1.1 TB/s

56 GT/s 128 Tx / Rx Lanes 1.4 TB/s 400 GB/s 1.8 TB/s

112 GT/s 64 Tx / Rx Lanes 1.4 TB/s 400 GB/s 1.8 TB/s

112 GT/s 128 Tx / Rx Lanes 2.8 TB/s 400 GB/s 3.2 TB/s

Page 7: Integrating Gen-Z with RISC-V

Key Gen-Z attributes for Scale-out Computing

7

Network addressing16-bit subnet IDs +12-bit component IDs + [ 64-bit memory address ]

A theoretical maximum of 228 (~268M) components

Memory-semantic datagram packets independent of fabric scale

No performance degradation to communicate across subnets

Does not require multiple component IDs to support multipath

Flexible destination and packet relay tables to support nearly any routing topology

Advanced OperationsMultiple buffer put / get variations

Collectives + Collective Acceleration

Signaled writes / Write MSG (send) with Receive Tags and Embedded Read

Virtual Channels

32 VCs

Remove cyclic resource dependencies for routing deadlock avoidance.

Reduce head-of-line blocking and / or cross path blocking.

Segregate traffic classes for performance isolation.

VC remapping to support components with different number of VCs

Packet Injection / Relay

Robust congestion management with automatic packet injection rate

Common source node adaptive / dispersive packet injection and switch adaptive / dispersive packet relay

Traffic Classes

Set of VCs for user-defined purposes

Performance within a TC is not affected by other TCs, e.g., TCs separate:

– Latency Sensitive (e.g., SHMEM)

– Bandwidth Sensitive (e.g., check point)

– Noise Sensitive (e.g., collectives)

– High-priority Applications

Multi-plane Support

All planes can be co-packaged within a single switch

A single cable can be used to connect to all planes

A single interface can be drive all planes

Adaptive / dispersive routing enables load balancing, resiliency, etc.

Low-latency FEC

2 ns per link hop

Page 8: Integrating Gen-Z with RISC-V

Gen-Z : A RISC-V should takeWhat RISC-V should do to take advantage of Gen-Z

8

– Should support Gen-Z in addition to DDR natively to differentiate from processors supporting only DDR natively.

– Should support 52 bit physical addressing.

– Cache Management Challenges (Support for Memory consistency models across fabric)

– Should support a minimum of 64 Gen-Z lanes of the PCIe Phy at 32 GT/s.

– Should support a minimum of 1K outstanding transactions on the coherency interface to support inherent SCM media latencies.

– Memory Mapping using ZMMU for Gen-Z

– LPD Support for PCIe compatibility

– Implement Far Atomics

– Take advantage of Gen-Z Security Features

– Protection/Translation architecture for a secure fabric

Page 9: Integrating Gen-Z with RISC-V

ZMMU Implementation : Page-Grid based ZMMU

9

Page Grid Table

PTE Table

Responder: Gen-Z Packet Address || Requester: Processor Physical Address

Page Grid 0Starting AddressPage Size—4 KiB

Number of Pages Base PTE Index

Page Grid NStarting AddressPage Size—1 GiB

Number of Pages Base PTE Index

PTE

PTE

PTE

PTE

PTE

PTE

PTE

PTE

PTE

PTE

PTE

PTE

PTE

PTE

PTE

PTE

.

.

.

4KiB Page

4KiB Page

4KiB Page

4KiB Page

4KiB Page

4KiB Page

4KiB Page

4KiB Page

1 GiB Page

1 GiB Page

1 GiB Page

1 GiB Page

1 GiB Page

1 GiB Page

1 GiB Page

1 GiB Page

• Gen-Z supports Page-table based ZMMU and Page-grid based ZMMU• A Page Table-based ZMMU is analogous to a processor page table MMU.• A page-table based ZMMU should be integrated into processor MMU or page table

caching should be implemented for performance reasons.• Recommend RISC-V to use page grid not page table technology as it is simpler and

easier to integrate compared to trying to integrate into a SMMU

Page 10: Integrating Gen-Z with RISC-V

Gen-Z : Trust and Security

10

– Multiple Access Control Techniques including:

– Access Keys (component group level access)

– Access Request and Access Response (fine-grain component-level access)

– R-Keys (page-level access control)

– R-Key Domains (Requester R-Key filtering)

– Switch Packet Filtering (control plane, leaf component)

– Peer Component Authorization (whitelist)

– Peer Nonce to detect rogue component insertion while in low-power state

– Packet authentication with Anti-replay Tags

– Hashed Message Authentication Code (HMAC)

– Uses transaction integrity key (TIK)– Only devices sharing TIK can communicate

– Packets dropped due to authentication violations

– Configurable interface / component containment

– Violations reported to management

– Currently, developing new component authentication, page-level encryption, and data authentication

SecurityEvent

Co

mp

AC

om

pB

Comp. A & C configuredwith “green” TIK

Comp. B configured with “purple” TIK

Authentication failed,packet dropped

Gen-ZSwitch

• RISC-V could add capabilities support to augment Gen-Z security features.

• Recommend RISC-V add a trust zone like capability to build on what Gen-Z enables to provide more page level security (Who configures the ZMMU – do you trust the node?)

Page 11: Integrating Gen-Z with RISC-V

Gen-Z : I/O with PCIe compatibility

11

Gen-Z Logical PCI Devices (LPDs)

▪ Gen-Z devices can be discovered/configured▪ Via standard PCI / PCIe system software

▪ LPDs can fully exploit Gen-Z Architecture• Low-latency switching

• Gen-Z 30 ns vs. PCIe 130-150 ns translates to 200-240 ns savings per read operation

• Memory-speed CPU-to-device communication

• Security and fine-grain hardware-enforced isolation (any-to-any communication without compromise)

• Supports all x86 / ARM / Power architecture Atomics

• Simplified single and multi-host I/O virtualization and sharing capabilities

• Multipath—aggregation / resiliency / robust topologies

• PCIe 2.5-32 GT/s PHY and 25-112 GT/s 802.3 Electrical

• Legacy plus New Gen-Z Scalable Connector and Scalable Form Factors

• CPU-based data movers to enable new software paradigms

• Scale-up and scale-out connectivity and performance

• Simplified software—any mix coherent and non-coherent operations

• And much more…

NVMe#2

Processor

Gen-Z Hardware Interface

PECAM Logic

Gen-Z

Switch Gen-Z SSD(NVMz) #2

Gen-Z GPU

#1Gen-Z SSD(NVMz) #1

Gen-Z GPU

#2

PCI Structure

PCI Structure

PCI Structure

Nati

ve

G

en

-Z V

iew

Lo

gic

al

PC

IV

iew

NVMe #2

PCI GPU

#1NVMe #1

LPD enabled Gen-Z devices appear

as PCIe devices to system software

Device without

LPD support

• RISC-V should support a PECAM through the Requestor ZMMU for LPD support

Page 12: Integrating Gen-Z with RISC-V

Gen-Z Transforms Performance

12

In-memory analytics

15xfaster

New algorithms Completely rethinkModify existing

frameworks

Similarity search

20xfaster

Financial models

8,000xfaster

Large-scale

graph inference

100xfaster

Page 13: Integrating Gen-Z with RISC-V

Gen-Z : Broad Industry and Component Support

13

Components

Intellectual Property

Connectors

Subsystems

Systems

Software

Consortium

Members

Alpha Data

AMD

Amphenol

ARM

Avery Design

Broadcom

Cadence

Cavium

Cisco

Cray

Dell EMC

ETRI

(Research)

Everspin

Foxconn

Interconnect

Google

HPR

Hirose Electric

Huawei

IBM

IDT

IntelliProp

Jess-Link

Keysight

Lenovo

Lots

LUXSHARE-ICT

Mellanox

Micron

Microsemi

Mobiveil

Molex

NetApp

Nokia

Oak Ridge National

Labs

PLDA

Qualcomm

Red Hat

Samsung

Seagate

Senko

Simula Research

SK hynix

Smart Modular

Spin Transfer

Technologies

TE Connectivity

Teledyne

LeCroy

Toshiba Memory

Tyco Electronics

UNH

VMware

WDC

Xilinx

YADRO

Yonsei

University

Page 14: Integrating Gen-Z with RISC-V

Gen-Z Consortium Milestones

14

– Significant milestones over the past year– Multi-vendor Proof-of-Concept Demonstrated (FMS’17 / SC’17)

– New demonstrations at HPE Discover / FMS’18 / SC’18

– Multiple draft and final specifications publicly available (core architecture, multiple mechanicals, PHY, scalable connector including new high-power and cabling, etc.)

– 40+ tutorials publicly available, YouTube channel, etc.

– Expanded membership (including academic & government agencies)

– Key Upcoming Objectives– Expand Gen-Z security to support component authentication and

page-level data encryption / authenticated

– Deliver design guides covering: – DRAM / SCM, LPD, Storage, eNIC, and high-speed messaging

– Complete ZSFF and PECFF mechanical form factor specifications– PECFF (September) and ZSFF (4Q2018)

– Release Gen-Z PHY specification with support:– PCIe 16 GT/s and 802.3 electrical 25 GT/s (September)

– 802.3 electrical 56 GT/s PAM 4 (4Q2018)

– PCIe 32 GT/s and 802.3 electrical 112 GT/s PAM 4 (2Q2019)

– Develop compliance testing

Page 15: Integrating Gen-Z with RISC-V

RISC-V : Building on Gen-Z Basics

Me

dia

Con

tro

ller

Ge

n-Z

Lo

gic

SoC

Ge

n-Z

Lo

gic

Media Module

Gen-Z Fabric

SCM

SCM

SCM

SCM

Me

dia

Con

tro

ller

Ge

n-Z

Lo

gic

SoC

Ge

n-Z

Lo

gic

Media Module

Gen-Z Fabric GPU

or

FPGA

Me

dia

Con

tro

ller

SoCMedia Module

SCM

SCM

SCM

SCM

Media Module

GPU

or

FPGA

Gen

-Z S

witch

Me

dia

Con

tro

ller

DRAM

DRAM

DRAM

DRAM

Media Module

SoC

CPU

CPU

Me

dia

Con

tro

ller

Media

Media

Media

Media

Media Module

Figure 1 – Storage Class Memory

Figure 2 – GPU or FPGA

Figure 3 – Multiple resources enabled by Universal Interconnect

• Supports DRAM, Flash, Memristor, PCRAM, MRAM, 3D-Xpoint…Universal Interconnect

• Decouples CPU/memory design

• Enables independent innovation

Page 16: Integrating Gen-Z with RISC-V

Thank youContact information

16