network interface architecture and prototyping for chip ...mpapamic/research/master thesis -...

64
University of Crete School of Sciences & Engineering Computer Science Department Master Thesis by Michael Papamichael Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors Supervisor Prof. Manolis G.H. Katevenis Heraklion, Crete, July, 2007

Upload: others

Post on 28-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

University of Crete

School of Sciences & Engineering

Computer Science Department

Master Thesis

by

Michael Papamichael

Network Interface Architecture and Prototyping

for Chip and Cluster Multiprocessors

Supervisor

Prof. Manolis G.H. Katevenis

Heraklion, Crete, July, 2007

Page 2: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Presentation Outline

NI Queue Manager

Introduction

Key Concepts

Architecture & Implementation

NI Design for CMPs

NI Design Goals

NI Design Issues

Proposed NI Design

Conclusions & Future Work

2

Page 3: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Presentation Outline

NI Queue Manager

Introduction

Key Concepts

Architecture & Implementation

NI Design for CMPs

NI Design Goals

NI Design Issues

Proposed NI Design

Conclusions & Future Work

3

Page 4: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager

FPGA-based Prototyping Platform

PCI-X RDMA-capable NIC in cluster environment

Buffered crossbar switch

Goals

Confirm buffered

crossbar behavior

Interprocessor

communication

research

4

Introduction

Page 5: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Presentation Outline

NI Queue Manager

Introduction

Key Concepts

Architecture & Implementation

NI Design for CMPs

NI Design Goals

NI Design Issues

Proposed NI Design

Conclusions & Future Work

5

Page 6: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager - Key Concepts

Head-Of-Line (HOL) Blocking

12

121

2

Switching FabricInput Queues Outputs

1

2

1

2

HOL Blocking

HOL Blocking reduces switch throughput

Idle!

6

Page 7: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager - Key Concepts

Virtual Output Queues (VOQs)

1

2

1

2

1

2

Switching FabricInput Queues Outputs

1

2

1

2

VOQs eliminate HOL Blocking

VOQs

7

Page 8: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager - Key Concepts

Traffic Segmentation Schemes

300 160 80

68256

256 B3 segments, S

40

256

580 bytes

Variable-size Multipacket seg.

300 160 8040

256256 256

768 bytes

3 segments, S = 256 B

Fixed-size Multipacket segments

300 16040 80

6464646464646464646464

704 bytes

11 segments, S = 64 B

Fixed-size Unipacket segments

300 160 8040

256 160

256 B5 segments, S

80

580 bytes

Variable-size Unipacket seg.

40

44

Traffic segmented to optimize switching

Variable-Size MultiPacket (VSMP) Segmentation well suited to

buffered crossbar

8

Page 9: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Presentation Outline

NI Queue Manager

Introduction

Key Concepts

Architecture & Implementation

NI Design for CMPs

NI Design Goals

NI Design Issues

Proposed NI Design

Conclusions & Future Work

9

Page 10: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager – Architecture & Implementation

Virtual Output Queues (VOQs)

VOQ migration to external memory

Hardware-managed linked lists

VSMP Segmentation

Scheduling

Flow Control

3 major versions implemented

10

Overview

Page 11: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager – Architecture & Implementation

Architecture

Linked List

Manager

Scheduler Flow Control

Packet

Processor

Host(PCI-X)

Network(RocketIO)

External Memory

(Off-chip VOQs)

Packet

Sorter

On-Chip

VOQs

Memory Controller

11

Page 12: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager – Architecture & Implementation

Packet Sorter

Linked List

Manager

Scheduler Flow Control

Packet

Processor

Host(PCI-X)

Network(RocketIO)

External Memory

(Off-chip VOQs)

Packet

Sorter

On-Chip

VOQs

Memory Controller

12

Page 13: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager – Architecture & Implementation

Sorts packets according to:

destination

other criteria (e.g. priority)

Notifies Scheduler about incoming traffic

Light-weight packet processing

e.g. enforce maximum packet size

2 versions implemented

with packet segmentation

without packet segmentation

13

Packet Sorter

Page 14: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager – Architecture & Implementation

On-Chip VOQs

Linked List

Manager

Scheduler Flow Control

Packet

Processor

Host(PCI-X)

Network(RocketIO)

External Memory

(Off-chip VOQs)

On-Chip

VOQs

Memory Controller

Packet

Sorter

14

Page 15: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager – Architecture & Implementation

Accumulates traffic in VOQs

VOQs implemented as:

Circular buffers in single statically

partitioned on-chip memory

Xilinx FIFOs

2 versions implemented

VOQs in BRAM

VOQs in Xilinx FIFOs

15

On-Chip VOQs

Page 16: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager – Architecture & Implementation

Linked List Manager

Linked List

Manager

Scheduler Flow Control

Packet

Processor

Host(PCI-X)

Network(RocketIO)

External Memory

(Off-chip VOQs)

Memory Controller

Packet

Sorter

On-Chip

VOQs

16

Page 17: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Linked List Manager

Performs Segment Transfers

variable-size segments

fixed-size segments

Manages Linked Lists

Head, Tail pointers in on-chip memory

Next Block pointers in DRAM (along data)

Optimization Techniques

Free Block Preallocation

Free-List Bypass

FSM-based Implementation

17

Page 18: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

2 3

Linked List Manager

1. From On-Chip VOQs to Packet Processor

2. From On-Chip to Off-Chip VOQs

3. From Off-Chip VOQs to Packet Processor

Segment Transfers

On-C

hip

VO

Qs

VO

Qs

Off

-Ch

ip

max. segment size = 1 block

variable-size segments

fixed-size blocks

from

Packet

Sort

er

to P

acke

t P

roce

sso

r

118

Page 19: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

32

Linked List Manager

1. From On-Chip VOQs to Packet Processor

2. From On-Chip to Off-Chip VOQs

3. From Off-Chip VOQs to Packet Processor

Segment Transfers

On

-Ch

ip

VO

Qs

VO

Qs

Off

-Ch

ip

max. segment size = 1 block

variable-size segments

1

fixed-size blocks

from

Packet

Sort

er

to P

ac

ke

t P

roc

es

so

r

19

Page 20: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

1

32

Linked List Manager

1. From On-Chip VOQs to Packet Processor

2. From On-Chip to Off-Chip VOQs

3. From Off-Chip VOQs to Packet Processor

Segment Transfers

On

-Ch

ip

VO

Qs

VO

Qs

Off

-Ch

ip

from

Packet

Sort

er

to P

acke

t P

roce

sso

r

fixed-size blocks

max. segment size = 1 block

variable-size segments

20

Page 21: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

1

32

Linked List Manager

1. From On-Chip VOQs to Packet Processor

2. From On-Chip to Off-Chip VOQs

3. From Off-Chip VOQs to Packet Processor

Segment Transfers

On-C

hip

VO

Qs

VO

Qs

Off

-Ch

ip

max. segment size = 1 block

variable-size segments

fixed-size blocks

from

Packet

Sort

er

to P

ac

ke

t P

roc

es

so

r

21

Page 22: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Linked List Manager

Large VOQs migrate to DRAM

Traffic stored in linked-lists of fixed-size blocks

Dynamic allocation of external memory

Block size needs to be:

Large to benefit from DRAM burst length

Small to minimize size of On-Chip VOQs

2 Basic Operations

Enqueue

Dequeue

Free blocks stored in Free-Block List

Linked List Management

22

Page 23: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Linked List Manager

Enqueue

Get free block from Free-Block list

Write data in the new block

Update Next-Block pointer of the last block

Update VOQ Tail pointer

Dequeue

Read data from the first block

Read Next-Block from first block

Update VOQ Head pointer

Put free block in Free-Block list

Basic Linked List Operations

23

Page 24: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Linked List Manager

Enqueue/Dequeue Example

0

1

2

3

4

5

15

16

17

18

19

20

TailHead

TailHead

… …

… …

VOQ Pointers

Free List

Next Free Block

Enqueue into VOQ 5Enqueue into VOQ 5Dequeue from VOQ 5

24

DRAM

0

1

…5

15

Page 25: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Linked List Manager

Finite State Machine (FSM)

DEQ1

Idle

DEQUEUE

DEQ2PUSH FREE BLOCK

PUSHFB2PUSHFB2

POP FREE BLOCK

POPFB2POPFB2 ENQ1

ENQUEUE

ENQ2

SRAM2XBAR

SRAM2XBAR2

SRAM2XBAR1

25

Page 26: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Linked List Manager

Finite State Machine (FSM)

DEQUEUE PUSH FREE BLOCK

POP FREE BLOCK ENQUEUE

On-Chip VOQs

to Packet

Processor

IDLE

26

Page 27: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager

Scheduler

Scheduler Flow Control

Packet

Processor

Host(PCI-X)

Network(RocketIO)

External Memory

(Off-chip VOQs)

Memory Controller

Packet

Sorter

On-Chip

VOQs

Linked List

Manager

27

Page 28: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Scheduler

Keeps track of each VOQ

On-chip occupancy

Off-chip occupancy

Employs Flow Control (network & local)

Number of sent data words

Number of credits

Implements Scheduling

Builds VOQ eligibility masks

Enforces scheduling policy

Instructs Linked List Manager

28

Page 29: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Scheduler

Determining Eligibility

One eligibility mask for each kind of transfer

Eager approach

Lazy approach

Scheduling Policy

Round-Robin

Weighted Round-Robin

Deficit Round-Robin

Strict Priority

Starvation?

Scheduling Issues

29

Page 30: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager

Packet Processor

Packet

Processor

Host(PCI-X)

Network(RocketIO)

External Memory

(Off-chip VOQs)

Memory Controller

Packet

Sorter

On-Chip

VOQs

Linked List

Manager

Scheduler Flow Control

30

Page 31: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Packet Processor

Processes Network Traffic

Receives variable-size segments

Creates autonomous network packets

Performs 3 Basic Operations

Insert header

Modify header

Delete header

Implemented as 3-stage pipeline

Greatly depends on packet nature

RDMA packets well suited

31

Page 32: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Packet Processor

Example of packet processing

= Packet Header = Packet Body

Segment 1Segment 2Seg 3Seg 4Seg 5

Packet 1Pck 2Pck 3Packet 4Packet 5

Modify

Insert

DeleteModify

InsertInsert

Insert

:: : : :

= End of Packet:

Traffic passing through Packet Processor

32

Page 33: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager

3 major versions

Full

“No external memory”

“No VSMP segmentation”

Variations of individual modules

Packet Processor with/without segmentation

On-Chip VOQs with BRAM/Xilinx FIFOs

Linked List Manager with/without external memory support

Implementation

33

Page 34: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager

FPGA Hardware Cost Results (8 VOQs)

Module LUTs Slices Flip Flops BRAMs Gate Count

Packet Sorter with segmentation 179 (1%) 135 (1%) 80 (1%) 0 (0%) 1909

Packet Sorter no segmentation 42 (1%) 25 (1%) 12 (1%) 0 (0%) 392

On-Chip VOQs BRAM 320 (1%) 236 (1%) 170 (1%) 31 (16%) 2035342

On-Chip VOQs Xilinx FIFOs 904 (2%) 1408 (7%) 1648 (4%) 32 (16%) 2119015

Scheduler 2240 (5%) 1256 (6%) 428 (1%) 1 (1%) 86226

Linked List Mgrwith ext mem 680 (1%) 365 (1%) 425 (1%) 0 (0%) 7069

Linked List Mgrno ext mem 33 (1%) 23 (1%) 18 (1%) 0 (0%) 426

Packet Processor 521 (1%) 511 (2%) 617 (1%) 0 (0%) 9983

Queue Mgr Version LUTs Slices Flip Flops BRAMs Gate Count

Full 2962 (7%) 2106 (10%) 1571 (3%) 34 (17%) 2263151

No External Memory 2713 (6%) 1900 (9%) 1467 (3%) 34 (17%) 2260321

No VSMPS 3430 (8%) 2639 (13%) 2286 (5%) 37 (19%) 2471047

34

Page 35: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Queue Manager

Network Performance Results*

Throughput measurements using

unbalanced traffic patterns.

Average Delay vs. Input Load under

uniform traffic. Max load is 96%.

*The experiments were conducted by Vassilis Papaefstathiou, member of the CARV team

Verified previous theoretical and simulation results about

the behavior and performance of buffered crossbar.

35

Page 36: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Presentation Outline

NI Queue Manager

Introduction

Key Concepts

Architecture & Implementation

NI Design for CMPs

NI Design Goals

NI Design Issues

Proposed NI Design

Conclusions & Future Work

36

Page 37: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

IPC: NI Design Issues

High Performance

Low Latency

High Bandwidth

Scalability

Reliability

Protection & Security

Communication Overhead Minimization

NI Design Goals

37

Page 38: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Presentation Outline

NI Queue Manager

Introduction

Key Concepts

Architecture & Implementation

NI Design for CMPs

NI Design Goals

NI Design Issues

Proposed NI Design

Conclusions & Future Work

38

Page 39: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Design Issues

NI Placement

1. I/O Bus

2. Memory Bus

3. Cache

4. CPU Registers

Processor

Proximity

Higher Performance

Proprietary Interfaces

Resource Sharing

Less Buffer Space

Lower Performance

Standard Interfaces

More Buffer Space

39

Page 40: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Design Issues

Programmed IO (PIO)

Uncached stores, loads or I/O instructions

Low bandwidth

Cached stores, loads

data transferred in cache blocks

requires coherence

Occupies Processor to copy data

Using Direct Memory Access (DMA)

Decouples Processor

Requires Virtual to Physical Address Translation

NI Data Transfer Mechanisms

40

Page 41: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

NI Design Issues

Traditional approach

System Call to Access NI

Very Costly (high latency)

User-level NIs (ULNIs)

Bypass the OS

Mapped to Virtual Memory

Uses OS paging mechanism

for protection

NI Virtualization & Protection

User-level

Software

Operating

System

NI Hardware

User-level

Software

Operating

System

NI Hardware

41

Page 42: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Presentation Outline

NI Queue Manager

Introduction

Key Concepts

Architecture & Implementation

NI Design for CMPs

NI Design Goals

NI Design Issues

Proposed NI Design

Conclusions & Future Work

42

Page 43: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Targeted to future Chip Multiprocessors thousands to millions of nodes

Tightly coupled with the processor

Low Complexity/Cost Compared to processor and local memory

Resources (e.g. memory) Shared with Processor

Dynamically allocated (only to active connections)

Low Overhead transfer initiation

Powerful Communication Primitives Bulky Data Transfers

Control & Synchronization Traffic

Design Goals / Desired Features

43

Page 44: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Target Hardware/System

Xilinx XUP board

CPU: PowerPC

OS: Linux

On-chip BRAM

Scratchpad

Cache?

External DRAM

10 Gbps Network

RocketIO

44

Page 45: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Target Hardware/System

PowerPC@ 266 MHz

On-chip Memory(low latency, high throughput,

8 dual-port BRAM banks)

Network

Network Interface simple

and small compared to

CPU and its local memory

NI

PLB

OCM ?

FPGA

10 Gbps

10 Gbps

@133 MHz

DRAM@133MHz

@166 MHz

Xilinx XUP

Shared among

CPU and NI

45

Page 46: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Remote DMA

Bulky Data Transfers

Producer-Consumer

Facilitates Zero-Copy Protocols

Requires Virtual-to-Physical Translation

Message Queues

Low Latency, Minimal Overhead communication

Small Data Transfers

One-to-One, One-to-Many, Many-to-One

Powerful synchronization primitive

Communication Primitives

46

Page 47: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Message Queues

Queues

One-to-One: e.g. Small Data Transfers

One-to-Many: e.g. Job Dispatching

Many-to-One: e.g. Synchronization

47

Page 48: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Mechanism for

Sending Messages

Initiating RDMA operations

A Connection consists of:

Queue Pair (Incoming & Outgoing)

Information needed for RDMA

Various Queue, Flow Control metadata

Connection Types

One-to-One (lower overhead, higher security)

Many-to-Many (better resource utilization)

Connections

48

Page 49: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

One-to-One

Incoming & Outgoing Queue is One-to-One

Many-to-Many

Incoming Queue is Many-to-One Or/And

Outgoing Queue is One-to-Many

Connection Types

49

Page 50: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

One-to-One: Local Communication

Many-to-Many: Global Communication

Connection Types - Example

50

Page 51: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Connection Table (CT)

Connection Table (CT)

ConnectionID or CID

CID

Scratchpad

CID

CID

CID

CID

CID…

Scratchpad

Node Memory

Connection Table Entry (CTE)

Connections reside in Connection Table in

the form of Connection Table Entries (CTEs)

51

Page 52: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Each CTE stores

Destination Info (for one-to-one)

Protection Info (for many-to-many)

Incoming/Outgoing Queue Info

Incoming/Outgoing RDMA Info

Flow Control Info

Other Info

e.g. Notification Type

Connection Table Entry (CTE)

52

Page 53: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Connection Table Entry for Xilinx XUP

B8 (32)B7 (32)B6 (32)B5 (32)B4 (32)B3 (32)B2 (32)B1 (32)CID

Base (6) Start (8) End (8) Head (10)

System-level

Outgoing Queue Info

Outgoing

RDMA Info

Base (6) Start (8) End (8) Tail (10)

System-level

Incoming Queue Info

Incoming

RDMA Info

Outgoing

Tail (10)

Incoming

Head (10)

User-level

Outgoing Queue Info

User-level

Incoming Queue Info

System-level

Connection Info

System-level

Connection Info

Remote

Node ID (24)

Remote

CID (16)PGID (16)

53

Page 54: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Software Interface

Send Message or Initiate RDMA

Read Head/Tail of Outgoing Queue

Post Descriptor

Write Tail of Outgoing Queue

(Poll Head of Outgoing Queue)

Receive Message or RDMA Notification

Poll Tail of Incoming Queue (or get Interrupt)

Read Message (or Descriptor)

Write Head of Incoming Queue

54

Page 55: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Software Interface - Descriptors

55

Page 56: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Protection – Intranode Protection

Process 13

Kernel

NI Hardware

Process 42

Process 27

User-Level

Isolation of malicious processes. Not allowed to:

Read or write data of connections of other processes

Corrupt connections of other processes

56

Page 57: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Protection – Internode Protection

Isolation of compromised node. Not allowed to:

Compromise other nodes

Corrupt connections of other processes

Node A Node B

Node D Node C

Node E

57

Page 58: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Protection in the Proposed NI

Creation of 3 Protection Zones

CID

CID

CID

CID

CID

CID

CID

CID

Connection TableBank 1 Bank 2 Bank 3 Bank 4 Bank 5 Bank 6 Bank 7 Bank 8

High protection

e.g. banks written

only by special NI

hardware or run-time

system

Moderate protection

e.g. banks written only by

system-level software

(e.g. kernel driver)

Low protection

e.g. banks written by

everyone including

user-level software

58

Page 59: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Controlling Access to the CT

Distinguish User-level from Kernel-level

Use of Shadow Address Spaces

Double the required Address Space

Mapped to system-

level process

Virtual Address Space

Physical Address Space

CTE

0xFFFF1ABC

0xFFFF0ABC

0xDEADBEEF Normal Physical

Address Space

Mapped to user-

level process

Shadow Physical

Address Space

0x123456EF

Connection Table

59

Page 60: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Proposed NI Design

Controlling Access to the CT

Distinguish different User-level processes

Fine-grain protection

Process 1

virtual page

Virtual Address Space

Connection TableProcess 1

physical page

Physical Address

Space

Process 2

virtual page

Process 3

virtual page

Process 2

physical page

Process 3

physical page

Process 4

physical page

32 CTEs

Process 4

virtual page

32 CTEs

32 CTEs

32 CTEs

Page

60

Page 61: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Presentation Outline

Key Concepts

NI Queue Manager

Architecture

Implementation

IPC: NI Design Issues

Proposed NI Design

Conclusions & Future Work

61

Page 62: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Conclusions & Future Work

NI Queue Manager Feasibility and Effectiveness of VSMPS

Confirmed buffered crossbar results

Novel Packet Processing Mechanism Eliminates need for reassembly

Proposed NI Design Lightweight, well suited to future CMPs

Powerful communication primitives Message Queues, Remote DMA

Notion of Connections/Queues

Versatile Protection & Security Mechanisms

Conclusions

62

Page 63: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Conclusions & Future Work

NI Queue Manager

Scalability

Dynamic memory allocation for On-Chip VOQs

Proposed NI Design

Process migration

Flow control

Efficient cache coherence support

Future Work

63

Page 64: Network Interface Architecture and Prototyping for Chip ...mpapamic/research/Master Thesis - Presentation.… · NI Queue Manager - Key Concepts Traffic Segmentation Schemes 300 160

Thank You!

64