and then there were none - ibm · and then there were none a stall-free real-time garbage collector...

96
AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

Upload: others

Post on 07-Jul-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

AND THEN THERE WERE NONE

A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware

David F. Bacon Perry Cheng Sunil Shukla

IBM Research

Page 2: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

IMPLEMENTING A PROGRAMMING LANGUAGEProgram

Circuit

Source CodeInterpreter

InstructionSet Processor

Page 3: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

IMPLEMENTING A PROGRAMMING LANGUAGEProgram

Circuit

Source CodeInterpreter

InstructionSet Processor

Circuit

InstructionSet Interpreter

Compiler

Machine Code

Compiler

Program

Page 4: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

IMPLEMENTING A PROGRAMMING LANGUAGEProgram

Circuit Circuit

Source CodeInterpreter

InstructionSet Processor

Circuit

InstructionSet Interpreter

Compiler

Machine Code

Compiler

Program

Circuit Layout

HardwareCompiler

Program

Page 5: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

PROGRAMMING RECONFIGURABLE HARDWARE(FPGAS)

• Programmed at very low level of abstraction

• same as designing custom circuits (ASICs)

• Verilog, VHDL prevail: bits and bit arrays are main abstraction

Page 6: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

PROGRAMMING RECONFIGURABLE HARDWARE(FPGAS)

• Programmed at very low level of abstraction

• same as designing custom circuits (ASICs)

• Verilog, VHDL prevail: bits and bit arrays are main abstractionHIGH LEVEL LANGUAGE

Page 7: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

PROGRAMMING RECONFIGURABLE HARDWARE(FPGAS)

• Programmed at very low level of abstraction

• same as designing custom circuits (ASICs)

• Verilog, VHDL prevail: bits and bit arrays are main abstractionHIGH LEVEL LANGUAGE

GARBAGE COLLECTION

Page 8: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

SYSTEM = APPLICATION + COLLECTOR

HAND-WRITTEN

HDLCOLLECTOR

&MEMORY

Page 9: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

RECONFIGURABLE HARDWARE BACKGROUND

Page 10: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

CONFIGURABLE LOGIC

UP TO 300K SLICES = 2.4M FLIP-FLOPS

Page 11: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

PROGRAMMABLE ROUTING NETWORK

SOURCE: WIKIMEDIA (CC) 2007

Page 12: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

BLOCK-RAM MEMORIES (BRAMS)

R/W

Address

Data In

Data Out

Page 13: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

BLOCK-RAM MEMORIES (BRAMS)

R/W

Address

Data In

Data Out

R/W

Address

Data In

Data Out

A

B

Page 14: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

BLOCK-RAM MEMORIES (BRAMS)

R/W

Address

Data In

Data Out

R/W

Address

Data In

Data Out

A

B

R/W

Address

Data InData Out

R/W

Address

Data InData Out

A

B

36 KBIT

36K X 118K X 2

…1K X 36

...

RAM OR

FIFO

Page 15: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

R/W

Address

Data InData Out

R/W

Address

Data InData Out

A

B

R/W

Address

Data In

Data Out

R/W

Address

Data In

Data Out

A

B

BLOCK-RAM MEMORIES (BRAMS)

Page 16: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

R/W

Address

Data InData Out

R/W

Address

Data InData Out

A

BR/W

Address

Data InData Out

R/W

Address

Data InData Out

A

B

R/W

Address

Data In

Data Out

R/W

Address

Data In

Data Out

A

B

Page 17: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

R/W

Address

Data InData Out

A

R/W

Address

Data InData Out

B

R/W

Address

Data In

Data Out

R/W

Address

Data In

Data Out

A

B

Page 18: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

WHAT WE BUILT

Page 19: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

COLLECTOR IN HARDWARE FOR HARDWARE

Page 20: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

COLLECTOR IN HARDWARE FOR HARDWARE

• Complete garbage collector

• NOT hardware-assist instructions (eg Azul, Lisp Machine)

• For on-chip FPGA memory

• NOT for large, general-purpose CPU DRAM

•With fixed object geometry (2 pointers + data)

• NOT for arbitrarily sized/shaped objects

• Snapshot-at-the-Beginning Algorithm [Yuasa 1990]

Page 21: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

Pointer to Write Pointer ValueAddr to Read/WriteAddr Alloc’dAlloc

Memory Subsystem

Page 22: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

Pointer to Write Pointer ValueAddr to Read/WriteAddr Alloc’dAlloc

Memory SubsystemAllocator Sweep EngineMark Engine

Memory

Page 23: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

Pointer to Write Pointer ValueAddr to Read/WriteAddr Alloc’dAlloc

ROOT

Snapshot Engine

GC

Memory SubsystemAllocator Sweep EngineMark Engine

Memory

Page 24: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

BB

A

0

Pointer to WriteAddr to Read/Write

Address to Clear

PointerMemory

1

Stack Top

Addr Alloc’dAddr to Free

Address Allocated

Pointer ValueAlloc

FreeStack

MALLOCATOR (INCL. 1 MEMORY “COLUMN”)

2

5

Page 25: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

BB

A

0

Pointer to WriteAddr to Read/Write

Address to Clear

PointerMemory

1

Stack Top

Addr Alloc’dAddr to Free

Address Allocated

Pointer ValueAlloc

FreeStack

MALLOCATOR (INCL. 1 MEMORY “COLUMN”)

2

5

Page 26: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

BB

A

0

Pointer to WriteAddr to Read/Write

Address to Clear

PointerMemory

1

Stack Top

Addr Alloc’dAddr to Free

Address Allocated

Pointer ValueAlloc

FreeStack

MALLOCATOR (INCL. 1 MEMORY “COLUMN”)

2

5

Page 27: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

BB

A

0

Pointer to WriteAddr to Read/Write

Address to Clear

PointerMemory

1

Stack Top

Addr Alloc’dAddr to Free

Address Allocated

Pointer ValueAlloc

FreeStack

MALLOCATOR (INCL. 1 MEMORY “COLUMN”)

5

Page 28: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

BB

A

0

Pointer to WriteAddr to Read/Write

Address to Clear

PointerMemory

1

Stack Top

Addr Alloc’dAddr to Free

Address Allocated

Pointer ValueAlloc

FreeStack

MALLOCATOR (INCL. 1 MEMORY “COLUMN”)

5

5

Page 29: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

BB

A

0

Pointer to WriteAddr to Read/Write

Address to Clear

PointerMemory

1

Stack Top

Addr Alloc’dAddr to Free

Address Allocated

Pointer ValueAlloc

FreeStack

MALLOCATOR (INCL. 1 MEMORY “COLUMN”)

5

5

Page 30: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

BB

A

0

Pointer to WriteAddr to Read/Write

Address to Clear

PointerMemory

1

Stack Top

Addr Alloc’dAddr to Free

Address Allocated

Pointer ValueAlloc

FreeStack

MALLOCATOR (INCL. 1 MEMORY “COLUMN”)

5

5

0

Page 31: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

BB

A

0

Pointer to WriteAddr to Read/Write

Address to Clear

PointerMemory

1

Stack Top

Addr Alloc’dAddr to Free

Address Allocated

Pointer ValueAlloc

FreeStack

WRITING A (POINTER) VALUE

0

Page 32: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

BB

A

0

Pointer to WriteAddr to Read/Write

Address to Clear

PointerMemory

1

Stack Top

Addr Alloc’dAddr to Free

Address Allocated

Pointer ValueAlloc

FreeStack

WRITING A (POINTER) VALUE

0

7

5

Page 33: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

BB

A

0

Pointer to WriteAddr to Read/Write

Address to Clear

PointerMemory

1

Stack Top

Addr Alloc’dAddr to Free

Address Allocated

Pointer ValueAlloc

FreeStack

WRITING A (POINTER) VALUE

075

Page 34: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

THE TRACE ENGINE

3 OPERATIONS(a) Get a root pointer and mark it

(b) Deque a pointer from mark queue and mark it

(c) Perform write barrier and mark overwritten pointer

Page 35: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

(a)

Page 36: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

3

(a)

Page 37: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

3

3

(a)

Page 38: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

3

3

(a)

Page 39: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

33

(a)

Page 40: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

33

(a)

Page 41: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

3

5

(b)

Page 42: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

3

5

(b)

Page 43: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

5

5

(b)

Page 44: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

55

(b)

Page 45: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

5

7

5

(c)

Page 46: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

5

7

5

3

5

(c)

Page 47: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

5

7

5

3

5

(c)

Page 48: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

5

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

7

5

7

5

35

(c)

Page 49: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

RESULTS

Page 50: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

Allocator

Memory

EVALUATE 3 SYSTEMSEVALUATE 3 SYSTEMS

(a) Malloc

(b) Stop-the-World GC

(c) Real-Time GC

Allocator Sweep Engine Mark Engine

Memory

Allocator Sweep Engine Mark Engine

Memory

Snapshot Engine

Page 51: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

EVALUATE SYSTEMS IN 3 CONTEXTS

(a) Collector in isolation (no application)

COLLECTOR

&MEMORY

Page 52: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

EVALUATE SYSTEMS IN 3 CONTEXTS

BINARY TREE

(HAND-WRITTEN HDL)

(b) Collector with Binary Tree benchmark

(a) Collector in isolation (no application)

COLLECTOR

&MEMORY

Page 53: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

EVALUATE SYSTEMS IN 3 CONTEXTS

BINARY TREE

(HAND-WRITTEN HDL)

(b) Collector with Binary Tree benchmark

(a) Collector in isolation (no application)

DEQUEUE

(HAND-WRITTEN HDL)

COLLECTOR

&MEMORY

(c) Collector with Double-ended Queue benchmark

Page 54: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

LOGIC (SLICE) USAGE - NO APPLICATION

Xilinx Virtex-5 LX330T51,840 Slices

• Tiny fraction of chip

• STW almost as complex as RTGC

Page 55: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

SYNTHESIZED CLOCK FREQUENCY - NO APPLICATION

• Frequency goes down with design complexity

• Malloc is faster, but advantage narrows

Page 56: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

EXECUTION TIME - DEQUEUE

Page 57: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

EXECUTION TIME - DEQUEUE

• RTGC uniformly faster than STW

• Malloc is faster, but not by that much (almost even for Binary Tree)

Page 58: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

CONCLUSIONS

• First complete garbage collector in hardware

• First garbage collector that NEVER pauses mutator

• Greatly expands expressiveness of hardware programs

• RTGC is faster, smaller, and cooler than STW

• RTGC in hardware is MUCH SIMPLER than in software

• Is something wrong with our processor designs?

Page 59: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

Questions?

Page 60: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

Questions?

• You only have 2 microbenchmarks. Isn’t that bogus?

• Isn’t a fixed object layout totally bogus?

• Can determinism be preserved with a more complex heap?

• Could this technique be applied to general-purpose systems?

• I don’t believe you never stall. Do you have a proof?

• Don’t you lose performance by reserving one of the ports?

• What unique hardware features made stall-freedom possible?

Suggestions:

Page 61: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

BACKUP

Page 62: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

ROLE OF THE GARBAGE COLLECTOR

COLLECTOR

&MEMORY

Page 63: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

ROLE OF THE GARBAGE COLLECTOR

APPLICATIONCOLLECTOR

&MEMORY

Page 64: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

ROLE OF THE GARBAGE COLLECTOR

APPLICATIONHAND-WRITTEN

HDLCOLLECTOR

&MEMORY

Page 65: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

ROLE OF THE GARBAGE COLLECTOR

APPLICATIONHAND-WRITTEN

HDLLIME TASK COLLECTOR

&MEMORY

Page 66: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

worker1(…) { … }

port-to-streamconnection

port-to-streamconnection

compound filter

char intchar int[[5]]

var pipeline = task worker1 => task worker2 => task worker3;

worker2(…) { … } worker3(…) { … }

PIPELINES IN THE LIME LANGUAGE

Page 67: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

worker1(…) { … }

port-to-streamconnection

port-to-streamconnection

compound filter

char intchar int[[5]]

var pipeline = task worker1 => task worker2 => task worker3;

worker2(…) { … } worker3(…) { … }

PIPELINES IN THE LIME LANGUAGE

Page 68: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

worker1(…) { … }

char intchar int[[5]]

var pipeline = task worker1 => task worker2 => task worker3;

worker2(…) { … } worker3(…) { … }

GARBAGE COLLECTING LIME TASKS

Page 69: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

worker1(…) { … }

char intchar int[[5]]

var pipeline = task worker1 => task worker2 => task worker3;

worker2(…) { … } worker3(…) { … }

GARBAGE COLLECTING LIME TASKS

Page 70: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

101

Mutator Register

W_EN

DATA_IN

DATA_OUT

DATA_INW_EN DATA_OUT

REGISTER MODULE

Page 71: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

101

Mutator Register

W_EN

DATA_IN

DATA_OUT

GC ROOT_OUT

000

Shadow Register

DATA_IN

DATA_OUT

W_EN

DATA_INW_EN DATA_OUT

REGISTER MODULE + SNAPSHOT COMPONENT

Page 72: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

101

Mutator Register

W_EN

DATA_IN

DATA_OUT

GC ROOT_OUT

000

Shadow Register

DATA_IN

DATA_OUT

W_EN

DATA_INW_EN DATA_OUT

REGISTER MODULE + SNAPSHOT COMPONENT

101

101

Page 73: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

101

Mutator Register

W_EN

DATA_IN

DATA_OUT

GC ROOT_OUT

000

Shadow Register

DATA_IN

DATA_OUT

W_EN

DATA_INW_EN DATA_OUT

REGISTER MODULE + SNAPSHOT COMPONENT

101

101

Page 74: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

B

Stack Top

Push/PopGC

Scan Pointer

Push Value Pop Value Root to Add

Shadow Register

Mutator Register

Write Reg Read Reg

MUX

MutatorStack

Page 75: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

A

A

B

000Barrier Reg

PointerMemory

MarkMap

Addr to Clear Pointer to Write

1 Mark Queue

Pointer Value Root to Add

Pointer to Trace

B

MUXM

UX

Addr to Read/Write

Page 76: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

B

A

Stack Top

Alloc

Address Allocated

SweepPointer

MarkMap

GC

Address to Free

FreeStack

MUX

Addr Alloc’d Addr to Clear

=10?

UsedMap

Page 77: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

GC

101

Scan Index

101

Top of Stack

W_EN

DATA_IN

W_EN

DATA_IN (PUSH)W_EN DATA_OUT (POP)

-

1

PUSH

MutatorStackMUX

+/-

1

ROOT_OUT

StateMachine

MUX

DATA_IN_A

W_EN_A

ADDR_IN_A

DATA_OUT_B

ADDR_IN_B

DATA_OUT_A

W_EN_B0

Page 78: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

ENABLERS FOR STALL-FREEDOM

• Dual-ported Memory

• Read-before-Write Memory and Registers

• Simple, uniquitous synchronization (clock edge)

• Forward reasoning about remote states (clock cycles)

• Determinism

Page 79: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

EXECUTION TIME IN CYCLES - DEQUEUE

Page 80: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

EXECUTION TIME IN CYCLES - DEQUEUE

• STW burns cycles while stopping the world• Malloc pays (a little) for explicit free operation• Malloc can run in a smaller heap (but not as bad as software)

Page 81: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

FIELD PROGRAMMABLE GATE ARRAYS

Page 82: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

FIELD PROGRAMMABLE GATE ARRAYS

Page 83: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

FIELD PROGRAMMABLE GATE ARRAYS

Page 84: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

FIELD PROGRAMMABLE GATE ARRAYS

Page 85: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

FIELD PROGRAMMABLE GATE ARRAYS

IOB

Page 86: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

GPUCPU PowerEN FPGA

CPU

Back

end

bytecode

GPU

Back

end

binaryNo

de B

acke

ndbinary

Veril

og B

acke

nd

bitfile

Page 87: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

GPUCPU PowerEN FPGA

CPU

Back

end

bytecode

GPU

Back

end

binaryNo

de B

acke

ndbinary

Veril

og B

acke

nd

bitfile

THE LIQUID METAL PROGRAMMING LANGUAGE

Lime

Lime Compiler

Page 88: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

LVM

EXECUTION, COMMUNICATION, AND REPLACEMENT

Page 89: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

LVM

EXECUTION, COMMUNICATION, AND REPLACEMENT

Page 90: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

LVM

EXECUTION, COMMUNICATION, AND REPLACEMENT

Page 91: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

LVM

EXECUTION, COMMUNICATION, AND REPLACEMENT

Page 92: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

STATEFUL TASKS

double avg(double x) { total += x; return total/++count;}

instance variables(local state)

double total;long count;

primitive filter

double double

var averager = task Averager().avg;

Page 93: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

VIRTUALIZATION OF DATA MOVEMENT

=>

Page 94: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

INTERPRETATION VERSUS COMPILATIONPROGRAMgetField

invokeVirtual

MOVBLR

INTERPRETER

INSTRUCTION SET PROCESSOR

Page 95: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

INTERPRETATION VERSUS COMPILATIONPROGRAMgetField

invokeVirtual

MOVBLR

INTERPRETER

INSTRUCTION SET PROCESSOR

Page 96: AND THEN THERE WERE NONE - IBM · AND THEN THERE WERE NONE A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware David F. Bacon Perry Cheng Sunil Shukla IBM Research

GARBAGE COLLECTION• Frees programmer from managing memory

• Simpler interfaces, easier debugging, memory safety

• Invented 1960 for IBM 704 with 18K• Current large FPGAs have memory commensurate with a VAX 11/780

• Recent results:• We built a garbage collector for data in on-chip BRAMs

• Able to handle a memory op each cycle without ever stalling

• Cost in slices and energy is ~0; cost in frequency and BRAM is small

• Algorithmically simpler than SW GC, yet achieves vastly better results

• Potential game-changer in scope of synthesizable code