fpga towards verified hardware compilation · performance extreme computing (hpec), 2013. oughput 0...

95
John Wickerson Imperial College London FMATS Workshop, MicrosoResearch Cambridge, 24 Sep 2018 Towards Verified Hardware Compilation FPGA

Upload: others

Post on 22-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John WickersonImperial College London

FMATS Workshop, Microsoft Research Cambridge, 24 Sep 2018

Towards Verified Hardware Compilation

FPGA

Page 2: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Collaborators 2

Nadesh Ramanathan George Constantinides

Page 3: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Hardware Compilation? 3

Page 4: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Hardware Compilation?• Also called "high-level synthesis".

3

Page 5: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Hardware Compilation?• Also called "high-level synthesis".

• Basic idea: translate C (or OpenCL, or …) to Verilog.

3

Page 6: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Hardware Compilation?• Also called "high-level synthesis".

• Basic idea: translate C (or OpenCL, or …) to Verilog.

• Custom hardware can be 10x faster and 10x more power-efficient than running software on a processor.

3

(1) S.O. Settle, "High-performance Dynamic Programming on FPGAs with OpenCL", in High Performance Extreme Computing (HPEC), 2013.

Thro

ughp

ut

0750

150022503000

CPU GPU FPGA

Thro

ughp

ut

per W

att

0255075

100

CPU GPU FPGA

Page 7: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Hardware Compilation? 4

Page 8: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Hardware Compilation?• Use of hardware compilers has grown ~20x since 2011.(2)

4

(2) S. Raje, "Extending the power of FPGAs to software developers", in Field-Programmable Logic and Applications (FPL), 2015. Keynote.

Page 9: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Hardware Compilation?• Use of hardware compilers has grown ~20x since 2011.(2)

• There are ~19x more software engineers than hardware engineers.(3)

4

(2) S. Raje, "Extending the power of FPGAs to software developers", in Field-Programmable Logic and Applications (FPL), 2015. Keynote.

(3) United States Bureau of Labor Statistics, "Occupational Outlook Handbook, 2016–17 Edition", 2015.

Page 10: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Hardware Compilation?• Use of hardware compilers has grown ~20x since 2011.(2)

• There are ~19x more software engineers than hardware engineers.(3)

• A user survey found "Lack of C-to-RTL formal verification" to be the biggest problem with hardware compilation.(4)

4

(2) S. Raje, "Extending the power of FPGAs to software developers", in Field-Programmable Logic and Applications (FPL), 2015. Keynote.

(3) United States Bureau of Labor Statistics, "Occupational Outlook Handbook, 2016–17 Edition", 2015.(4) Deep Chip, “Survey on HLS verification issues and power reduction”, 2014.

http://www.deepchip.com/items/0544-03.html

Page 11: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Hardware Compilation of Concurrency

5

GPU

CPU

FPGA

Page 12: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Atomic operations 6

Page 13: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Atomic operations• Atomics must appear to execute instantaneously to

other threads

6

Page 14: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Atomic operations• Atomics must appear to execute instantaneously to

other threads

• Atomics provide a variety of ordering guarantees

6

Page 15: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Atomic operations• Atomics must appear to execute instantaneously to

other threads

• Atomics provide a variety of ordering guarantees

6

r = atomic_load(&y, memory_order_acquire);if (r==1) { print(x); }

x = 1;atomic_store(&y, 1, memory_order_release);

Page 16: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Atomic operations• Atomics must appear to execute instantaneously to

other threads

• Atomics provide a variety of ordering guarantees

6

r = atomic_load(&y, memory_order_acquire);if (r==1) { print(x); }

x = 1;atomic_store(&y, 1, memory_order_release);

atomic_store(&x, 1, memory_order_relaxed);

r1 = atomic_load(&x, memory_order_relaxed);r2 = atomic_load(&x, memory_order_relaxed);

Page 17: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Weak-memory concurrency is tricky!

7

Page 18: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Weak-memory concurrency is tricky!

• x86 proved tricky to formalise correctly.(5,6)

7

(5) Sarkar et al., POPL, 2009.(6) Owens et al., TPHOLs, 2009.

Page 19: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Weak-memory concurrency is tricky!

• x86 proved tricky to formalise correctly.(5,6)

• Bug found in deployed "IBM Power 5" processors.(7)

7

(5) Sarkar et al., POPL, 2009.(6) Owens et al., TPHOLs, 2009.(7) Alglave et al., CAV, 2010.

Page 20: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Weak-memory concurrency is tricky!

• x86 proved tricky to formalise correctly.(5,6)

• Bug found in deployed "IBM Power 5" processors.(7)

• C++ specification did not guarantee its own key property.(8)

7

(5) Sarkar et al., POPL, 2009.(6) Owens et al., TPHOLs, 2009.(7) Alglave et al., CAV, 2010.(8) Batty et al., POPL, 2011.

Page 21: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Weak-memory concurrency is tricky!

• x86 proved tricky to formalise correctly.(5,6)

• Bug found in deployed "IBM Power 5" processors.(7)

• C++ specification did not guarantee its own key property.(8)

• Behaviour of NVIDIA's graphics processors contradicted their own programming guide.(9)

7

(5) Sarkar et al., POPL, 2009.(6) Owens et al., TPHOLs, 2009.(7) Alglave et al., CAV, 2010.(8) Batty et al., POPL, 2011.(9) Alglave et al., ASPLOS, 2015.

Page 22: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Compiling atomics 8

FPGA

Page 23: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Compiling atomics 8

r = atomic_load(&y, memory_order_acquire);

FPGA

Page 24: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Compiling atomics 8

r = atomic_load(&y, memory_order_acquire);

**not supported**

FPGA

Page 25: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Compiling atomics 9

r = atomic_load(&y, memory_order_acquire);

FPGA

Page 26: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Compiling atomics 9

r = atomic_load(&y, memory_order_acquire);

lock();r = y;unlock();

FPGA

Page 27: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Compiling atomics 10

FPGA

r = atomic_load(&y, memory_order_acquire);

Page 28: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Compiling atomics 10

FPGA

r = atomic_load(&y, memory_order_acquire);

r = y;

Page 29: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation 11

FPGA

Thread 1

Thread 3

Memory

MemoryMemory

Thread 2

Page 30: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Atomic operations• Atomics must appear to execute instantaneously to

other threads

• Atomics provide a variety of ordering guarantees

12

Page 31: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Atomic operations• Atomics must appear to execute instantaneously to

other threads

• Atomics provide a variety of ordering guarantees

12

Page 32: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Atomic operations• Atomics must appear to execute instantaneously to

other threads

• Atomics provide a variety of ordering guarantees

12

Page 33: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation 13

Page 34: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation 13

atomic_store(&x, 1, memory_order_relaxed);

r1 = atomic_load(&x, memory_order_relaxed);r2 = atomic_load(&x, memory_order_relaxed);

Page 35: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation 13

atomic_store(&x, 1, memory_order_relaxed);

r1 = atomic_load(&x, memory_order_relaxed);r2 = atomic_load(&x, memory_order_relaxed);

x = 1;

r1 = x;r2 = x;

Page 36: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

1 2 3 4r1 = x; load xr2 = x; load x

13

atomic_store(&x, 1, memory_order_relaxed);

r1 = atomic_load(&x, memory_order_relaxed);r2 = atomic_load(&x, memory_order_relaxed);

x = 1;

r1 = x;r2 = x;

Page 37: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

1 2 3 4r1 = x; load xr2 = x; load x

13

1x = 1; store x

atomic_store(&x, 1, memory_order_relaxed);

r1 = atomic_load(&x, memory_order_relaxed);r2 = atomic_load(&x, memory_order_relaxed);

x = 1;

r1 = x;r2 = x;

Page 38: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation 14

x = 1;

r1 = x;r2 = x;

1x = 1; store x

atomic_store(&x, 1, memory_order_relaxed);

r1 = atomic_load(&x, memory_order_relaxed);r2 = atomic_load(&x, memory_order_relaxed);

1 2r1 = x; load xr2 = x; load x

Page 39: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation 14

x = 1;

r1 = x;r2 = x;

1x = 1; store x

atomic_store(&x, 1, memory_order_relaxed);

r1 = atomic_load(&x, memory_order_relaxed);r2 = atomic_load(&x, memory_order_relaxed);

Page 40: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation 15

x = 1;

r1 = x;r2 = x/a;

1x = 1; store x

atomic_store(&x, 1, memory_order_relaxed);

r1 = atomic_load(&x, memory_order_relaxed);r2 = atomic_load(&x, memory_order_relaxed);

Page 41: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation 16

x = 1;

r0 = y+y+y+y+y+y;r1 = x;r2 = x/a;

1x = 1; store x

atomic_store(&x, 1, memory_order_relaxed);

r1 = atomic_load(&x, memory_order_relaxed);r2 = atomic_load(&x, memory_order_relaxed);

Page 42: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation 16

1 2 3 4 5 … 36

r0 = y+y+y+ y+y+y;

load yload yload y

load yload y

load yr1 = x; load x

r2 = x/a;load x

divide

x = 1;

r0 = y+y+y+y+y+y;r1 = x;r2 = x/a;

1x = 1; store x

atomic_store(&x, 1, memory_order_relaxed);

r1 = atomic_load(&x, memory_order_relaxed);r2 = atomic_load(&x, memory_order_relaxed);

Page 43: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Constraints on scheduling 17

Page 44: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Constraints on scheduling• Two atomic accesses to the same location cannot be reordered.

17

Page 45: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Constraints on scheduling• Two atomic accesses to the same location cannot be reordered.

• An atomic acquire load cannot be reordered with accesses that come later in program order

17

Page 46: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Constraints on scheduling• Two atomic accesses to the same location cannot be reordered.

• An atomic acquire load cannot be reordered with accesses that come later in program order

• An atomic release store cannot be reordered with accesses that come earlier in program order

17

Page 47: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Constraints on scheduling• Two atomic accesses to the same location cannot be reordered.

• An atomic acquire load cannot be reordered with accesses that come later in program order

• An atomic release store cannot be reordered with accesses that come earlier in program order

• An atomic SC access cannot be reordered with any other access.

17

Page 48: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Results 18

(10)Ramanathan et al., "Hardware Synthesis of Weakly Consistent C Concurrency", FPGA, 2017

Page 49: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Checking correctness 19

Page 50: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Checking correctness• Ask Memalloy(11) for an execution that is forbidden

according to the C++ standard but is allowed by our scheduling constraints.

19

(11)Wickerson et al., "Automatically Comparing Memory Consistency Models", POPL, 2017

Page 51: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Checking correctness• Ask Memalloy(11) for an execution that is forbidden

according to the C++ standard but is allowed by our scheduling constraints.

19

(11)Wickerson et al., "Automatically Comparing Memory Consistency Models", POPL, 2017

Page 52: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Checking correctness• Ask Memalloy(11) for an execution that is forbidden

according to the C++ standard but is allowed by our scheduling constraints.

• Memalloy uses the Alloy model checker, which in turn uses a SAT-solving backend.

19

(11)Wickerson et al., "Automatically Comparing Memory Consistency Models", POPL, 2017

Page 53: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Can we do better?

20

Page 54: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation 21

1x = 1; store x

atomic_store(&x, 1, memory_order_relaxed);

r1 = atomic_load(&x, memory_order_relaxed);r2 = atomic_load(&x, memory_order_relaxed);

1 2r1 = x; load xr2 = x; load x

Page 55: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation 22

r1 = atomic_load(&x, memory_order_relaxed);r2 = atomic_load(&x, memory_order_relaxed);

1 2r1 = x; load xr2 = x; load x

Page 56: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Sync-aware scheduling 23

s = atomic_load(&yr, memory_order_acquire);

if (s==1) { print(y); }

r = atomic_load(&xr, memory_order_acquire); if (r==1) { print(x); }

x = 1;

atomic_store(&xr, 1, memory_order_release);

y = 1;

atomic_store(&yr, 1, memory_order_release);

Page 57: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Sync-aware scheduling 24

s = atomic_load(&yr, memory_order_acquire);

if (s==1) { print(y); }

r = atomic_load(&xr, memory_order_acquire); if (r==1) { print(x); }

x = 1;

atomic_store(&xr, 1, memory_order_release);

y = 1;

atomic_store(&yr, 1, memory_order_release);

Page 58: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Sync-aware scheduling 25

s = atomic_load(&yr, memory_order_acquire);

if (s==1) { print(y); }

r = atomic_load(&xr, memory_order_acquire); if (r==1) { print(x); }

x = 1;

atomic_store(&xr, 1, memory_order_release);

y = 1;

atomic_store(&yr, 1, memory_order_release);

Page 59: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Sync-aware scheduling 26

s = atomic_load(&yr, memory_order_acquire);

if (s==1) { print(y); }

r = atomic_load(&xr, memory_order_acquire); if (r==1) { print(x); }

x = 1;

atomic_store(&xr, 1, memory_order_release);

y = 1;

atomic_store(&yr, 1, memory_order_release);

Page 60: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Sync-aware scheduling 27

s = atomic_load(&yr, memory_order_acquire);

if (s==1) { print(y); }

r = atomic_load(&xr, memory_order_acquire); if (r==1) { print(x); }

x = 1;

atomic_store(&xr, 1, memory_order_release);

y = 1;

atomic_store(&yr, 1, memory_order_release);

Page 61: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Sync-aware scheduling 28

s = atomic_load(&yr, memory_order_acquire);

if (s==1) { print(y); }

r = atomic_load(&xr, memory_order_acquire); if (r==1) { print(x); }

x = 1;

atomic_store(&xr, 1, memory_order_release);

y = 1;

atomic_store(&yr, 1, memory_order_release);

Page 62: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Sync-aware scheduling 29

s = atomic_load(&yr, memory_order_acquire);

if (s==1) { print(y); }

r = atomic_load(&xr, memory_order_acquire); if (r==1) { print(x); }

x = 1;

atomic_store(&xr, 1, memory_order_release);

y = 1;

atomic_store(&yr, 1, memory_order_release);

Page 63: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Longer paths too 30

x = 1;

atomic_store(&y, 1, memory_order_release);

r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release);}

s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

Page 64: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Results 31

(12)Ramanathan et al., "Concurrency-Aware Thread Scheduling for High-Level Synthesis", FCCM, 2018

Page 65: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 32

x = 1;

atomic_store(&y, 1, memory_order_release);

r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release);}

s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

Page 66: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 33

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 67: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 34

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 68: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 35

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 69: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 36

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 70: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 37

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 71: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 38

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 72: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 39

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 73: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 40

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 74: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 41

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 75: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 42

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 76: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 43

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 77: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 44

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 78: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 45

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 79: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 46

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 80: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 47

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 81: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 48

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 82: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 48

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

• Our solution: enumerate only the "primary" paths.

Page 83: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 49

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 84: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

Poorly scaling analysis 50

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 85: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

Poorly scaling analysis 51

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 86: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Poorly scaling analysis 52

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

Page 87: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Poorly scaling analysis 53

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

Page 88: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Poorly scaling analysis 54

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

Page 89: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Poorly scaling analysis 55

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

Page 90: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Poorly scaling analysis 56

r = atomic_load(&y, memory_order_acquire); r = atomic_load(&y, memory_order_acquire); if (r==1) { atomic_store(&z, 1, memory_order_release); atomic_store(&z, 1, memory_order_release); }

s = atomic_load(&z, memory_order_acquire);s = atomic_load(&z, memory_order_acquire);

if (s==1) { print(x); }

x = 1;

atomic_store(&y, 1, memory_order_release);atomic_store(&y, 1, memory_order_release);

Page 91: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Checking correctness• As before, we use Memalloy to check that our constraints are

strong enough to guarantee C++ semantics.

57

Page 92: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Where next? 58

FPGA

Page 93: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Where next? 58

FPGA

• Heavyweight: a fully verified hardware compiler (e.g. a Verilog backend for CompCert).

Page 94: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John Wickerson Towards Verified Hardware Compilation

Where next? 58

FPGA

• Heavyweight: a fully verified hardware compiler (e.g. a Verilog backend for CompCert).

• Lightweight: automatically generate and verify SystemVerilog assertions, à la RTLCheck.(13)

(13)Manerkar et al., "RTLCheck: Verifying the Memory Consistency of RTL Designs", MICRO, 2017.

Page 95: FPGA Towards Verified Hardware Compilation · Performance Extreme Computing (HPEC), 2013. oughput 0 750 1500 2250 3000 CPU GPU FPGA oughput att 0 25 50 75 100 ... John Wickerson Towards

John WickersonImperial College London

FMATS Workshop, Microsoft Research Cambridge, 24 Sep 2018

Towards Verified Hardware Compilation