qed bug detection and localization - darpa | eri summit · 2020-05-19 · l fully automated logic...

1
Distribution Statement A – Approved for Public Release, Distribution Unlimited www.darpa.mil QED Bug Detection and Localization Subhasish Mitra (Stanford) Electrical Bugs Detected error count (normalized to QED) QED 0 0.5 1 1-10 Billion No-QED Error detection latency (clock cycles) 0-10K Detected error count (normalized to QED) QED 0 0.5 1 1-10 Billion No-QED Error detection latency (clock cycles) 0-10K 10 6 X 4X Intel® 48-Core SCC No boot Pass 48 processor cores 0.9V, 800 MHz QED unique detect QED enhanced detect QED quick detect Freescale SoC Logic Bug 8-Core Industrial Test Difficult Logic Bugs Source: Intel Post- silicon bug count Year Pre-silicon verification inadequate “Post-silicon cost & complexity rising faster than design cost” – S. Yerramilli, V.P., Intel Design Pre-silicon Verification Post - silicon Validation High Volume Fab Post-Silicon Validation Critical Detect bugs Root-cause & fix Run tests (OS, games) Debug time: Months, weeks per bug Localize bugs Localization Dominates Cost Localization Timeline Error occurred Error detection latency Ideal ~ 1,000 cycles Reality ~ Billions cycles Error detected Test execution Long Error Detection Latency Challenge Q uick E rror D etection QED Wide variety Diversity Systematic Automated Original Tests Test 1 Test 2 Test N l Error detection latency: guaranteed short l Coverage: improved l Software & hardware approaches QED family Tests QED Test 1 QED Test 2 QED Test N l Structured and Effective Ø 10 9 X quicker detection, 4X coverage l Automatically localize bugs l No failure reproduction, no simulation l Broadly applicable: Cores, uncore, power management, logic & electrical, accelerators Q uick E rror D etection Highlights QED Transformation Example ... Core 1 Core 2 <PLC mem [1..N]> <PLC mem [1..N]> <PLC mem [1..N]> <PLC mem [1..N]> <PLC mem [1..N]> Core N <PLC mem [1..N]> <PLC mem [1..N]> <PLC mem [1..N]> A’=A B’=B C’=C A = B * 2 A’= B’* 2 Check(A==A’) D’=D E’=E F’=F G’=G H’=H E = F * G E’= F’* G’ Check(E==E’) H = D + E H’= D’+ E’ Check(H==H’) E’=E I’=E J’=J K’=K I = E / 2 I’= E’/ 2 Check(I==I’) Load J mem[7 ] Load J’mem[7’] Check(J==J’) K = J + 1 K’= J’+ 1 Check(K==K’) Lock(1,’1) Store mem[1 ] C Store mem[1’] C’ Unlock(1,1’) Lock(5,5’) Store mem[5 ] H Store mem[5’] H’ Unlock(5,5’) ALL Cores ALL Threads <PLC mem[1..N]> for ALL i,i’ Lock(i) Lock(i’) Load X mem[i] Load X’mem[i’] Check (X == X’) Unlock(i’) Unlock(i) Symbolic QED and A-QED l Fully automated logic bug localization l Pre-silicon and post-silicon bugs l Formal verification with no manual properties l Implementation only dependent on ISA l Scalable: billion-transistor SoCs l Verify any IP block (accelerators, uncore, etc.) l High-level language and RTL descriptions Traditional debug Automatic S-QED Weeks to months 20 mins. to 7 hours Long bug traces 3- to 22-cycle bug traces 0% 20% 40% 60% 80% 100% 0 100 1K 10K 100k 1M Cumulative bugs detected Bug Trace Length (cycles) >10M Original Min., Mean, Max.: 722, 1.9M, 11M Symbolic QED Min., Mean, Max.: 13, 20, 29 10 6 X 2X E-QED Results (min, average, max) Buggy design module Uniquely identified: 100% Flip-flop Candidates (Out of ~1 million) (5, 18, 26) flip-flops Area Overhead 2.5% or less Debug Effort Automatic Runtime (7, 8.7, 12) hours Bug localized by formal analysis of signatures Symbolic QED Results E-QED Error detection latency (cycles) Original QED 15 Billion 9 Interconnection network Core 1 Core 0 Core N Core 2 Core 3 Random Instruction Test Generator Shared Caches Memory Controllers Accelerators Other uncore components l QED enables automatic electrical bug localization l Targets bugs at all system levels l No expensive, customized platform Design RTL Counter-example Search all input sequences for QED check fails Return bug activation: Minimal failing QED input sequence ISA based RTL design and QED module Counter- example High-level language approaches Counter- example Counter- example RTL approach Symbolic QED A-QED ISA Specific QED Module Bounded Model Checking High-level IP Design Synthesized High-level IP Design Synthesized High-level QED Module RTL IP Design ILA Based QED Module QED Symbolic Execution Bounded Model Checking Bounded Model Checking This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. Posh Open Source Hardware (POSH)

Upload: others

Post on 13-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: QED Bug Detection and Localization - DARPA | ERI Summit · 2020-05-19 · l Fully automated logic bug localization l Pre-silicon and post-silicon bugs l Formal verification with no

Distribution Statement A – Approved for Public Release, Distribution Unlimited

www.darpa.mil

QED Bug Detection and LocalizationSubhasish Mitra (Stanford)

Electrical Bugs

Det

ecte

d er

ror c

ount

(n

orm

aliz

ed to

QE

D)

QED

0

0.5

1

1-10 Billion

No-QED

Error detection latency (clock cycles)0-10K

Det

ecte

d er

ror c

ount

(n

orm

aliz

ed to

QE

D)

QED

0

0.5

1

1-10 Billion

No-QED

Error detection latency (clock cycles)0-10K

106X

4X

Intel® 48-Core SCC

No boot

Pass

48 processor cores

0.9V, 800 MHz

QED unique detect

QED enhanced detect

QED quick detect

Freescale SoC Logic Bug 8-Core Industrial Test Difficult Logic Bugs

Source: Intel

Post-silicon bug

count

Year

Pre-silicon verification inadequate

“Post-silicon cost & complexity rising faster than design cost”

– S. Yerramilli, V.P., Intel

Design Pre-siliconVerification

Post-siliconValidation

High Volume

Fab

Post-Silicon Validation Critical

Detect bugs

Root-cause & fix

Run tests (OS, games)

Debug time:Months, weeks per bug

Localize bugs

Localization Dominates Cost

Localization

Timeline

Error occurred

Error detection latencyIdeal ~ 1,000 cyclesReality ~ Billions cycles

Error detected

Test execution

Long Error Detection Latency Challenge

Quick Error Detection

QEDWide variety

DiversitySystematicAutomated

Original TestsTest 1Test 2

……

Test N

l Error detection latency: guaranteed short

l Coverage: improved

l Software & hardware approaches

QED family Tests

QED Test 1QED Test 2

……

QED Test N

l Structured and Effective

Ø 109X quicker detection, 4X coverage

l Automatically localize bugs

l No failure reproduction, no simulation

l Broadly applicable:

Cores, uncore, power management, logic & electrical, accelerators

Quick Error Detection Highlights

QED Transformation Example

...

Core 1 Core 2

<PLC mem[1..N]>

<PLC mem[1..N]>

<PLC mem[1..N]>

<PLC mem[1..N]>

<PLC mem[1..N]>

Core N

<PLC mem[1..N]>

<PLC mem[1..N]>

<PLC mem[1..N]>

A’=A B’=B C’=C

A = B * 2

A’= B’* 2

Check(A==A’)

D’=D E’=E F’=F

G’=G H’=H

E = F * G

E’= F’* G’

Check(E==E’)

H = D + E

H’= D’+ E’

Check(H==H’)

E’=E I’=E

J’=J K’=K

I = E / 2

I’= E’/ 2

Check(I==I’)

Load J ← mem[7 ]

Load J’← mem[7’]

Check(J==J’)

K = J + 1

K’= J’+ 1

Check(K==K’)

Lock(1,’1)

Store mem[1 ] ← C

Store mem[1’] ← C’

Unlock(1,1’)

Lock(5,5’)

Store mem[5 ] ← H

Store mem[5’] ← H’

Unlock(5,5’)

ALL Cores

ALL Threads

<PLC mem[1..N]>

for ALL i,i’

Lock(i)

Lock(i’)

Load X ← mem[i]

Load X’← mem[i’]

Check (X == X’)

Unlock(i’)

Unlock(i)

Symbolic QED and A-QED

l Fully automated logic bug localizationl Pre-silicon and post-silicon bugsl Formal verification with no manual propertiesl Implementation only dependent on ISAl Scalable: billion-transistor SoCsl Verify any IP block (accelerators, uncore, etc.) l High-level language and RTL descriptions

Traditional debug Automatic S-QEDWeeks to months 20 mins. to 7 hours

Long bug traces 3- to 22-cycle bug traces

0%

20%

40%

60%

80%

100%

0 100 1K 10K 100k 1M

Cum

ulat

ive

bugs

det

ecte

d

Bug Trace Length (cycles)>10M

OriginalMin., Mean, Max.: 722, 1.9M, 11M

Symbolic QEDMin., Mean, Max.: 13, 20, 29

106X

2X

E-QED Results(min, average, max)

Buggy design module Uniquely identified: 100%

Flip-flop Candidates(Out of ~1 million)

(5, 18, 26) flip-flops

Area Overhead 2.5% or less

Debug Effort Automatic

Runtime (7, 8.7, 12) hours

Bug localized by formal analysis of signatures

Symbolic QED Results

E-QED

Error detection latency (cycles)

Original QED15 Billion 9

Interconnection network

Core 1Core 0 Core NCore 2 Core 3

Random Instruction Test Generator

Shared Caches

Memory Controllers Accelerators Other uncore

components

l QED enables automatic electrical bug localizationl Targets bugs at all system levelsl No expensive, customized platform

Design RTL

Counter-example

Search all input sequences for QED check fails

Return bug activation: Minimal failing QED input sequence

ISA based RTL design and QED module

Counter-example

High-level language approaches

Counter-example

Counter-example

RTL approach

Symbolic QED

A-QED

ISA SpecificQED Module

Bounded Model Checking

High-level IP Design

Synthesized High-level IP

Design

Synthesized High-level

QED Module

RTL IP Design

ILA Based QED Module

QED Symbolic Execution

Bounded Model Checking

Bounded Model Checking

This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA).The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

Posh Open Source Hardware (POSH)