eliminating the hardware/software divide satnam singh, microsoft research cambridge, uk

146
Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Upload: tobias-booker

Post on 30-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Eliminating the Hardware/Software Divide

Satnam Singh, Microsoft Research Cambridge, UK

Page 2: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

!IRQ, NMI

Page 3: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 4: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 5: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 6: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 7: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 8: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 9: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 10: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 11: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

t

Page 12: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 13: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 14: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

locksmonitorscondition variablesspin lockspriority inversion

Page 15: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 16: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 17: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 18: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 19: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 20: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

multipleindependentmulti-ported

memories

fine-grainparallelism

andpipelining

hard and softembeddedprocessors

Page 21: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

LUTs are just higher order functions

i o

lut1

oi1

i0

lut2 lut3

i0

i1i2

o

lut4

i0

i1i2i3

o

inv = lut1 not

and2 = lut2 (&&)

mux = lut3 (l s d0 d1 . if s then d1 else d0)

Page 22: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

XC6VLX760 758,784 logic cells, 864 DSP blocks, 1,440 dual ported 18Kb RAMs

32-bitintegerAdder(32/474,240)>700MHz

332x1440

14820 sim-adds1,037,400,000,000additions/second

Page 23: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 24: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

XD2000i FPGA in-socketaccelerator for Intel FSB

XD2000F FPGA in-socketaccelerator for AMD socket F

XD1000 FPGA co-processormodule for socket 940

Page 25: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 26: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Case Study – Spam Filtering (Alessandro Forin, MSR Redmond)

• Benchmark– ~50,000 regular expressions from

Forefront Team (snapshot fromtheir Exchange server in Aug ‘09)

• Performance– Up to 6000x faster than standard Intel

processors– Capable of processing at line rate of gigabit

Ethernet

• Power Requirement– 7 – 10 watts rather than 200++ watts

Page 27: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

27

Software Version FPGA Version

~6000 Messages/Sec~1 Message/Sec

<10 Watts200++ Watts

“E-mail Server”

Reg Ex Processing

“E-mail Server”

Reg Ex Processing

Page 28: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 29: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 30: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

René Müller (ETH)

FPGAs + SQL [VLDB]

Page 31: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

CPU FPGA

Page 32: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 33: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 34: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Speed Grade -1 -2 -3

With Layout 270MHz 320MHz 362MHz

Without Layout 210MHz 260MHz 280MHz

541 seconds

1896 seconds

Page 35: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

opportunity

scientific computingdata miningsearchimage processingfinancial analytics

challenge

Page 36: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

The Accidental Semi-colon

;

Page 37: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

public static int[] SequentialFIRFunction(int[] weights, int[] input) { int[] window = new int[size]; int[] result = new int[input.Length]; // Clear to window of x values to all zero. for (int w = 0; w < size; w++) window[w] = 0; // For each sample... for (int i = 0; i < input.Length; i++) { // Shift in the new x value for (int j = size - 1; j > 0; j--) window[j] = window[j - 1]; window[0] = input[i]; // Compute the result value int sum = 0; for (int z = 0; z < size; z++) sum += weights[z] * window[z]; result[i] = sum; } return result; }

Page 38: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

PLDI 1998

Page 39: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

PLDI 2003

12345

Page 40: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

PLDI 2010

1 2 3 4 50

10

20

30

40

50

60

70

80

Series1Series2

Page 41: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

POPL 1998

Page 42: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

POPL 2002

Page 43: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

POPL 2010

Page 44: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 45: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

ray of light

Signal

Liquid Metal

PRET-C

Bluespec

Feldspar

AcceleratorRapidMind /Ct

Streams-C

EsterelSHIM

Page 46: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

universallanguage?

embeddedhigh levelsoftware

FPGA

GPU

DSP

machinelearning

grand unificationtheory polygots

Gannet

DSLs

Page 47: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Our High Level Synthesis Projects

Kiwi: concurrent C# programs for control-oriented

applications[David Greaves,

Univ. Cambridge]

shape analysis: synthesis of

dynamic data structures (C)

[MPI and CMU]

Accelerator/FPGA:synthesis of data

parallel programs in C++/C#/F#

[MSR Redmond]

HLINQeDSLs

[Gavin Bierman]

+ compilation of self-recursive Haskell functions to FPGA circuits!

Page 48: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Redmond Accelerator TeamBarry Bond

Kerry HammilLubomir Litchev

<anonymous other person>

Page 49: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Effort vs. Reward

loweffort

lowreward

higheffort

highreward

mediumeffort

mediumreward

CUDAOpenCLHLSLDirectComputeAccelerator

Page 50: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Accelerator

Page 51: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Application.EXE Accelerator.DLL

Windows on Intel/AMDprocessor

DX9

GPU(ATI, Nvidia, …)

Page 52: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

open Systemopen Microsoft.ParallelArrayslet main(args) = let x = new FloatParallelArray (Array.map float32 [|1; 2; 3; 4; 5 |]) let y = new FloatParallelArray (Array.map float32 [|6; 7; 8; 9; 10 |]) let z = x + y use dx9Target = new DX9Target() let zv = dx9Target.ToArray1D(z) printf "%A\n" zv 0

Page 53: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

open Systemopen Microsoft.ParallelArrayslet main(args) = let x = new FloatParallelArray (Array.map float32 [|1; 2; 3; 4; 5 |]) let y = new FloatParallelArray (Array.map float32 [|6; 7; 8; 9; 10 |]) let z = x + y use sse3Target = new X64MulticoreTarget() let zv = sse3Target.ToArray1D(z) printf "%A\n" zv 0

Page 54: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

open Systemopen Microsoft.ParallelArrayslet main(args) = let x = new FloatParallelArray (Array.map float32 [|1; 2; 3; 4; 5 |]) let y = new FloatParallelArray (Array.map float32 [|6; 7; 8; 9; 10 |]) let z = x + y use fpgaTarget = new FPGAMulticoreTarget() fpgaTarget.ToArray1D(z) 0

Page 55: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 56: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 57: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

[1; 2; 3; 4; 5]

FloatParallelArray

CPU Address Space

F# Array

GPU Address Space

EncapsulatedData-parallelarray

x

[6; 7; 8; 9; 10]

FloatParallelArray

y

100010101101011010

x+yGPU code

GPU memory

GPU code

y

[7; 9; 11; 13; 15]F# Array

Page 58: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

using System;using Microsoft.ParallelArrays;

namespace AddArraysPointwise{ class AddArraysPointwiseDX9 { static void Main(string[] args) { var x = new FloatParallelArray (new[] {1.0F, 2, 3, 4, 5}); var y = new FloatParallelArray (new[] {6.0F, 7, 8, 9, 10}); var dx9Target = new DX9Target(); var z = x + y; foreach (var i in dx9Target.ToArray1D (z)) Console.Write(i + " "); Console.WriteLine(); } }}

Page 59: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

module Mainwhere

import Accelerator

x = fpa [1.0, 2.0, 3.0, 4.0, 5.0]y = fpa [6.0, 7.0, 8.0, 9.0, 10.0]

z = x + y

main = do dx9Target <- c_DX9Target_Create r <- acceleratorCompute dx9Target z putStrLn (show r)

Page 60: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 61: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 62: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 63: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 64: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 65: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

rX *

pa

Shift (0,0) k[0]

+

+

*

Shift (0,1) k[1]

+

let rec convolve (shifts : int -> int []) (kernel : float32 []) i (a : FloatParallelArray) = let e = kernel.[i] * ParallelArrays.Shift(a, shifts i) if i = 0 then e else e + convolve shifts kernel (i-1) a

Page 66: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 67: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 68: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 69: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 70: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

public static int[] SequentialFIRFunction(int[] weights, int[] input) { int[] window = new int[size]; int[] result = new int[input.Length]; // Clear to window of x values to all zero. for (int w = 0; w < size; w++) window[w] = 0; // For each sample... for (int i = 0; i < input.Length; i++) { // Shift in the new x value for (int j = size - 1; j > 0; j--) window[j] = window[j - 1]; window[0] = input[i]; // Compute the result value int sum = 0; for (int z = 0; z < size; z++) sum += weights[z] * window[z]; result[i] = sum; } return result; }

Page 71: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 72: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 73: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

shift (x, 0) = [7, 2, 5, 9, 3, 8, 6, 4] = xshift (x, -1) = [7, 7, 2, 5, 9, 3, 8, 6]shift (x, -2) = [7, 7, 7, 2, 5, 9, 3, 8]

Page 74: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

y = [y[0], y[1], y[2], y[3], y[4], y[5], y[6], y[7]]= a[0] * [x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7]] + a[1] * [x[-1], x[0], x[1], x[2], x[3], x[4], x[5], x[6]] +

a[2] * [x[-2], x[-1], x[0], x[1], x[2], x[3], x[4], x[5]] + a[3] * [x[-3], x[-2], x[-1], x[0], x[1], x[2], x[3], x[4]] + a[4] * [x[-4], x[-3], x[-2], x[-1], x[0], x[1], x[2], x[3]]

y = a[0] * shift (x, 0) + a[1] * shift (x, -1) + a[2] * shift (x, -2) + a[3] * shift (x, -3) + a[4] * shift (x, -4)

Page 75: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

using Microsoft.ParallelArrays;using A = Microsoft.ParallelArrays.ParallelArrays;namespace AcceleratorSamples{ public class Convolver { public static float[] Convolver1D(Target computeTarget, float[] a, float[] x) { var xpar = new FloatParallelArray(x); var n = x.Length; var ypar = new FloatParallelArray(0.0f, new [] { n }); for (int i = 0; i < a.Length; i++) ypar += a[i] * A.Shift(xpar, -i); float[] result = computeTarget.ToArray1D(ypar); return result; } }}

for (int i = 0; i < a.Length; i++) ypar += a[i] * A.Shift(xpar, -i);

Page 76: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

using System;using Microsoft.ParallelArrays;namespace AcceleratorSamples{ public class Convolver2D { static FloatParallelArray convolve(Func<int, int[]> shifts, float[] kernel, int i, FloatParallelArray a) { FloatParallelArray e = kernel[i] * ParallelArrays.Shift(a, shifts(i)); if (i == 0) return e; else return e + convolve(shifts, kernel, i - 1, a); } static FloatParallelArray convolveXY(float[] kernel, FloatParallelArray input) { FloatParallelArray convolveX = convolve(i => new [] { -i, 0 }, kernel, kernel.Length - 1, input); return convolve(i => new [] { 0, -i }, kernel, kernel.Length - 1, convolveX); } static void Main(string[] args) { const int inputSize = 10; var random = new Random(42); var inputData = new float[inputSize, inputSize]; for (int row = 0; row < inputSize; row++) for (int col = 0; col < inputSize; col++) inputData[row, col] = (float)random.NextDouble() * random.Next(1, 100); var testKernel = new float[]{2, 5, 7, 4, 3} ; var dx9Target = new DX9Target(); var inputArray = new FloatParallelArray(inputData); var result = dx9Target.ToArray2D(convolveXY (testKernel, inputArray)); for (var row = 0; row < inputSize; row++) { for (var col = 0; col < inputSize; col++) Console.Write("{0} ", result[row, col]); Console.WriteLine(); } } }}

static FloatParallelArray convolve(Func<int, int[]> shifts, float[] kernel, int i, FloatParallelArray a){ FloatParallelArray e = kernel[i] * ParallelArrays.Shift(a, shifts(i)); if (i == 0) return e; else return e + convolve(shifts, kernel, i - 1, a);}

static FloatParallelArray convolveXY(float[] kernel, FloatParallelArray input) { FloatParallelArray convolveX = convolve(i => new [] { -i, 0 }, kernel, kernel.Length - 1, input); return convolve(i => new [] { 0, -i }, kernel, kernel.Length - 1, convolveX); }

Page 77: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

using System;using System.Linq;using Microsoft.ParallelArrays;namespace AcceleratorSamples{ static class Convolver2D { static FloatParallelArray convolve(this FloatParallelArray a, Func<int, int[]> shifts, float[] kernel) { return kernel .Select((k, i) => k * ParallelArrays.Shift(a, shifts(i))) .Aggregate((a1, a2) => a1 + a2); } static FloatParallelArray convolveXY(this FloatParallelArray input, float[] kernel) { return input .convolve(i => new[] { -i, 0 }, kernel) .convolve(i => new[] { 0, -i }, kernel); } static void Main(string[] args) { const int inputSize = 10; var random = new Random(42); var inputData = new float[inputSize, inputSize]; for (int row = 0; row < inputSize; row++) for (int col = 0; col < inputSize; col++) inputData[row, col] = (float)random.NextDouble() * random.Next(1, 100); var testKernel = new[] { 2F, 5, 7, 4, 3 }; var dx9Target = new DX9Target(); var inputArray = new FloatParallelArray(inputData); var result = dx9Target.ToArray2D(inputArray.convolveXY(testKernel)); for (var row = 0; row < inputSize; row++) { for (int col = 0; col < inputSize; col++) Console.Write("{0} ", result[row, col]); Console.WriteLine(); } } }}

static FloatParallelArray convolve(this FloatParallelArray a, Func<int, int[]> shifts, float[] kernel) { return kernel .Select((k, i) => k * ParallelArrays.Shift(a, shifts(i))) .Aggregate((a1, a2) => a1 + a2); } static FloatParallelArray convolveXY(this FloatParallelArray input, float[] kernel) { return input .convolve(i => new[] { -i, 0 }, kernel) .convolve(i => new[] { 0, -i }, kernel); }

Page 78: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

open Systemopen Microsoft.ParallelArrays[<EntryPoint>]let main(args) = // Declare a filter kernel for the convolution let testKernel = Array.map float32 [| 2; 5; 7; 4; 3 |] // Specify the size of each dimension of the input array let inputSize = 10 // Create a pseudo-random number generator let random = Random (42) // Declare a psueduo-input data array let testData = Array2D.init inputSize inputSize (fun i j -> float32 (random.NextDouble() * float (random.Next(1, 100)))) // Create an Accelerator float parallel array for the F# input array use testArray = new FloatParallelArray(testData) // Declare a function to convolve in the X or Y direction let rec convolve (shifts : int -> int []) (kernel : float32 []) i (a : FloatParallelArray) = let e = kernel.[i] * ParallelArrays.Shift(a, shifts i) if i = 0 then e else e + convolve shifts kernel (i-1) a // Declare a 2D convolver let convolveXY kernel input = // First convolve in the X direction and then in the Y direction let convolveX = convolve (fun i -> [| -i; 0 |]) kernel (kernel.Length - 1) input let convolveY = convolve (fun i -> [| 0; -i |]) kernel (kernel.Length - 1) convolveX convolveY // Create a DX9 target and use it to convolve the test input use dx9Target = new DX9Target() let convolveDX9 = dx9Target.ToArray2D (convolveXY testKernel testArray) printfn "DX9: -> \r\n%A" convolveDX9 0

let convolveXY kernel input = // First convolve in the X direction and then in Y let convolveX = convolve (fun i -> [| -i; 0 |]) kernel (kernel.Length - 1) input let convolveY = convolve (fun i -> [| 0; -i |]) kernel (kernel.Length - 1) convolveX convolveY

Page 79: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

x64 multicore target benchmark for 2D convolver (24 core server Xeon E7540)

6 core speedup12 core speedup18 core speedup24 core speedup

kernel size

spee

dup

over

one

core

Page 80: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 81: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Convolver

Page 82: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 83: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

8.249ns max delay3 x DSP48Es63 slice registers24 slice LUTs

Page 84: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 85: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 86: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 87: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 88: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 89: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

DRAM

Page 90: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

technology node 130nm CMOS(2006)

45nm CMOS(2008)

transfer 32bacross-chip

20 computations 57 computations

transfer 32boff-chip

260 computations 1300 computations

Power of Computation vs. Communication

numbers derived from work by W. Dally, Stanford

Page 91: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 92: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 93: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Virtex-6 XC6VLX240T-1317MHzArea 6%108 DSP48E1 (out of a768)2.7W (at 25C)1,110mega-samples per secondcf. CUDA version on 470GTX at 552 mega-samples per second(single precision)

Page 94: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Kiwi Thesis

thread 2

thread 3

thread 1

»

Page 95: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 98: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 99: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 101: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Kiwi

structural imperative (C)parallelimperative

gate-level VHDL/Verilog Kiwi C-to-

gates

&0

0

0

Q

QSET

CLR

S

R

;

;

;

jpeg.cthread 2

thread 3

thread 1

Page 102: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 103: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 104: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 105: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

KiwiLibrary

Kiwi.cs

circuitmodel

JPEG.cs

Visual Studio

multi-thread simulationdebuggingverification

Kiwi Synthesis

circuitimplementation

JPEG.v

Page 106: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

parallelprogram

C#

Thread 1

Thread 2

Thread 3

Thread 3

C togates

C togates

C togates

C togates

circuit

circuit

circuit

circuitVerilog

for system

Page 107: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Ports and Clockspublic static class I2C { [OutputBitPort("scl")] static bool scl;

[InputBitPort("sda_in")] static bool sda_in;

[OutputBitPort("sda_out")] static bool sda_out;

[OutputBitPort("rw")] static bool rw;

circuit ports identified by

custom attribute

Page 108: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

public static int max2(int a, int b){ int result; if (a > b) result = a; else result = b; return result;}

.method public hidebysig static int32 max2(int32 a, int32 b) cil managed{ // Code size 12 (0xc) .maxstack 2 .locals init ([0] int32 result) IL_0000: ldarg.0 IL_0001: ldarg.1 IL_0002: ble.s IL_0008

IL_0004: ldarg.0 IL_0005: stloc.0 IL_0006: br.s IL_000a

IL_0008: ldarg.1 IL_0009: stloc.0 IL_000a: ldloc.0 IL_000b: ret}

max2(3, 7)

stack

local memory

0

377

7

Page 109: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Writing to a Channelpublic class Channel<T>{ T datum; bool empty = true; public void Write(T v) { lock (this) { while (!empty) Monitor.Wait(this); datum = v; empty = false; Monitor.PulseAll(this); } }

Page 110: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Reading from a Channel

public T Read(){ T r; lock (this) { while (empty) Monitor.Wait(this); empty = true; r = datum; Monitor.PulseAll(this); } return r;}

Page 111: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 112: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

systems level concurrency constructsthreads, events, monitors, condition variables

rendezvous join patterns transactionalmemory

dataparallelism

user applications

domain specificlanguages

Page 113: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 114: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Filter Example

thread one-placechannel

Page 115: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Transposed Filter

Page 116: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

static void Tap(int i, byte w, Kiwi.Channel<byte> xIn, Kiwi.Channel<int> yIn, Kiwi.Channel<int> yout){ byte x; int y; while(true) { y = yIn.Read(); x = xIn.Read(); yout.Write(x * w + y); }}

Page 117: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Inter-thread Communication and Synchronization

// Create the channels to link together the tapsfor (int c = 0; c < size; c++){ Xchannels[c] = new Kiwi.Channel<byte>(); Ychannels[c] = new Kiwi.Channel<int>(); Ychannels[c].Write(0); // Pre-populate y-channel registers with zeros}

Page 118: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

// Connect up the taps for a transposed filterfor (int i = 0; i < size; i++){ int j = i; // Quiz: why do we need the local j? Thread tapThread = new Thread(delegate() { Tap(j, weights[j], Xchannels[j], Ychannels[j], Ychannels[j+1]); }); tapThread.Start();}

Page 119: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 120: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 121: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 122: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 123: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

static public void echo() { tx_sof_n = !false; // We are not at the start of a frame tx_src_rdy_n = !false; tx_eof_n = !false; // We are not at the end of a frame bool start = !rx_sof_n && !rx_src_rdy_n; // The start condition int i, j; bool doneReading;

while (true) // Process packets indefinately { // Wait for SOF and SRC_RDY while (!start) { Kiwi.Pause(); // Wait for a clock tick start = !rx_sof_n && !rx_src_rdy_n; // Check for start of frame } // Read in the entire frame i = 0; doneReading = false;

// Read the remaining bytes while (!doneReading) { if (!rx_src_rdy_n) { buffer[i] = rx_data; i++; } doneReading = !rx_eof_n; Kiwi.Pause(); }

Page 124: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 125: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

C#

softprocessor

Page 126: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 127: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

fib :: Int -> Intfib 0 = 0fib 1 = 1fib n  = n1 + n2     where     n1 = fib (n – 1)    n2 = fib (n - 2)

Page 128: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

STATE 1 FREE   PRECASE    ds1 := ds  CASE ds1    WHEN 1 =>      RETURN 1    WHEN 0 =>      RETURN 0    WHEN others =>      v0 := ds1 - 2      RECURSE [v0] 2 [ds1]  END CASESTATE 3 FREE n2  n1 := resultInt  v2 := n1 + n2  RETURN v2STATE 2 FREE ds1  n2 := resultInt  v1 := ds1 - 1  RECURSE [v1] 3 [n2]

Page 129: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 130: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

gcd_dijkstra :: Int -> Int -> Intgcd_dijkstra m n  = if m == n then      m    else      if m > n then        gcd_dijkstra (m - n) n      else        gcd_dijkstra m (n - m)

Page 131: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 132: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 133: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 134: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 135: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 136: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

relocation viavirtualization???

Page 137: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

+ encryption + virtualization + data-processing

no standard ABIno FPGA-kernel-userspace modelThe cloud is just an extension of existing OS paradigms… FPGAs get left behind… they lack abstraction boundaries

Page 138: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Split Trust

managing physical devicevs.

using a physical devicemanagement

domain?

Page 139: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

FPGAs Improve Cloud Security

Page 140: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

Barrelfish Heterogeneous Operating System

Page 141: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 142: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 143: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 144: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 145: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK
Page 146: Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

IT’S The End of the World as we Know It