a 1.5 ghz awp elliptic curve crypto chip o. hauck, s. a. huss icslab tu darmstadt a. katoch philips...

31
A 1.5 GHz AWP A 1.5 GHz AWP Elliptic Curve Crypto Chip Elliptic Curve Crypto Chip O. Hauck, S. A. Huss O. Hauck, S. A. Huss ICSLAB TU Darmstadt ICSLAB TU Darmstadt A. Katoch A. Katoch Philips Research Philips Research

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

A 1.5 GHz AWPA 1.5 GHz AWPElliptic Curve Crypto ChipElliptic Curve Crypto Chip

O. Hauck, S. A. HussO. Hauck, S. A. Huss

ICSLAB TU DarmstadtICSLAB TU Darmstadt

A. KatochA. KatochPhilips ResearchPhilips Research

2

OutlineOutline

Current AWP projects

GATS-Chip

Elliptic Curve Chip

AWPs compared to sync wave pipes

SRCMOS circuits

Crypto background

Architecture and Implementation

Conclusion

3

Status of AWP ProjectsStatus of AWP Projects 2D-DCT:

0.6µm, being re-designed with self-resetting logic

SRT:

currently on schematics only

64b Giga-Hertz Adder Test Site:

0.6µm, almost complete, tape out in May

Crypto chip:

0.35µm, tape out in July targeted

4

Giga-Hertz Adder Test SiteGiga-Hertz Adder Test Site

AMS 0.6µm 3M CMOS

64b Brent-Kung adder

~10k devices, ~1.3sqmm

latency ~2.5ns

cycle 1.0ns

on-chip test circuitry

5

General Framework for PipelinesGeneral Framework for Pipelines

LogicLogic

Latch/Reg

Latch/Reg

Latch/Reg

Latch/Reg

Data

Clk

i o

6

Some Notations...Some Notations...

register of timehold :

register of timeup-set :

register ofdelay n propagatio :

registerat skewclock eduncontroll :

clockoutput andinput between delay :

registersoutput andinput at skew lintentiona : ,

timecycleor periodclock :

stable be tohas node internal timeminimum : )(

node internal

input to fromdelay logic maximum and minimum : )(),(

delay logic maximum and minimum : ,

logicin nodesoutput gate all ofset :

maxmin

maxmin

hold

setup

d

skew

io

oi

clk

stable

t

t

t

t

T

Giit

Gi

itit

tt

G

7

General RelationsGeneral Relations

(6) )())()((

: allfor respected be tohas width pulse minimum Similarly,

skewclock and overheadregister ation,delay variby bounded timecycle e., I.

(5) 2)( :implies (4) ivity,By transit

(4)

:(3) and (2) Combining

(3) :boundUpper

(2) :boundLower

data beforeoutput at clocks# equals latency´´,clock ``global called is

(1) at timeclock output by latched is Data

minmax

minmax

minmax

min

max

skewstableclk

skewholdsetupclk

skewholdclkdclkskewsetup

skewholddiclk

skewsetupdi

oclk

titititT

Gi

tttttT

tttTtTkttt

ttttTt

ttttt

k

Tkt

8

Synchronous Wave PipelineSynchronous Wave Pipeline

Wave LogicWave Logic

Latch/Reg

Latch/Reg

Latch/Reg

Latch/Reg

Data

Clk1 2

Promise: higher throughput at reduced latency, clock load,

area and power

Drawback: difficult tuning of logic and delay elements1

1,0minmax

k

ttttT

k

ttttk

skewholddclk

skewsetupd

Discrete, distinct valid frequency ranges

Low high narrow frequency range

not suitable for system design

k

1k

9

Throughput determined by longest logic path +

clock/register overhead

Fine-grain pipelining allows high throughput at the cost of

increased clock/register overhead

Synchronous PipelineSynchronous Pipeline

LogicLogic

Latch/Reg

Latch/Reg

Latch/Reg

Latch/Reg

Data

Clk

skewsetupdclk ttttTk max0,1

10

Asynchronous Wave Pipeline (AWP)Asynchronous Wave Pipeline (AWP)

Wave LogicWave Logic

Wave Latch

Wave Latch

Wave Latch

Wave Latch

Data

req_in req_outmatched delaymatched delay

More than one data and request propagating coherently

One-sided cycle time constraint

Delay must track logic over PTV corners skewsetupd

skewholddclk

tttt

ttttTk

max

min0

11

Example: 64-b Brent-Kung Parallel Adder Example: 64-b Brent-Kung Parallel Adder

pg PG PG G

x

o

r

0 1 2 3 4

Buffers provide

for same depth

on every logic

path

All gates in the

same column

must have the

same delay

12

CircuitsCircuits

Logic style used has to minimize delay variation Earlier work focused on bipolar logic (ECL, CML), but

CMOS is mainstream Static CMOS is not well suited for wave piping, fixing the

problem results in more power and slower speed Pass transistor logic gives slopy edges thereby

introducing delay variation Dynamic logic is attractive as only output high transition is

data-dependant, output pulldown is done by precharge What is needed is a dynamic logic family without

precharge overhead: SRCMOS

13

SRCMOSSRCMOS

Distinguishing property of our SRCMOS circuits: precharge feedback is fully local, and NMOS trees are delay balanced

Ninputs

output

14

Operation of a 2-ANDOperation of a 2-AND

15

CISCO Data Encryption Service AdapterCISCO Data Encryption Service Adapter

[Cisco Systems]

16

DES Key Exchange using Public-Key DES Key Exchange using Public-Key Cryptosystem based on Elliptic CurvesCryptosystem based on Elliptic Curves

D Key-DES

key) (public

key) (public

key) (privatekey) (private

secret same thehave now Bob and Alice

)( :functionhash )( :functionhash

viakey session compute viakey session compute

compute compute

compute

compute

random choose random choose

Bob Alice

public ,),(

00

0

0

0

0

0

PhDPhD

DD

PkkPPkkP

Pk

Pk

kk

EPbaE

ABBA

BPk

PkA

BA

B

A

17

Security based upon DLP: in a finite Abelian group we can easily compute given

However, is hard to compute out of and DLP extraordinarily hard for point group of elliptic

curve:

Set of solutions of cubic equation over any field is an abelian group

Why is this secure ?Why is this secure ?

GNkGp ,00pkp

k p 0p

baxxxyy 232

18

Elliptic Curve Mathematics and AlgorithmElliptic Curve Mathematics and Algorithm Two types - supersingular and non-supersingular Non-supersingular have the highest security EC equation: baxxxyy 232

19

Adding Two Points Over Elliptic CurvesAdding Two Points Over Elliptic Curves

20

Optimal Normal BasisOptimal Normal Basis

21

Multiplication over ONBsMultiplication over ONBs

22

The Final FormulaThe Final Formula

23

Architecture of MultiplierArchitecture of Multiplier

delay

delay

abx

abx

abx

abx

abx

abx

1

2

3

259

260

261

3_Xor

3_Xor

3_Xor

3_Xor

3_Xor

3_

Xo

r 3

_X

or

3_Xor

123

783782781

1

87

Wa

ve

la

tch

Wa

ve

la

tch

Wa

ve

la

tch

1

87

1

1

9

27

29

Pseudo NMOS SRCMOS

request

24

Dual-rail CircuitsDual-rail Circuits

Dual-rail cross-coupled SRCMOS circuit NMOS trees are designed such that there is only one

conducting path to ground

N N

Out Out

25

Delay Variations at Various StagesDelay Variations at Various Stages

outp uts after first stage

inputs to final stage

final output

Cycle time=666.7ps

Signals after first stage (Data path width = 87)

26

Hierarchy of ControlHierarchy of Control

260 0260 0

alwaysalways

kkxx

left shiftleft shift

Hamming weight = 40Hamming weight = 40

EC doubleEC double EC addEC add

If x=1If x=1

ADDADD MULMUL LOAD/LOAD/STORESTORE

77 1313

1 261 11 261 1

EC arithmetic R * 2347 MUL/sEC arithmetic R * 2347 MUL/s

Finite field arithmetic R * 612567 bit/sFinite field arithmetic R * 612567 bit/s

* 261* 261

Double-and-Add Key generation Double-and-Add Key generation rate Rrate R

*(261*7+40*13)*(261*7+40*13)

27

Control Unit ArchitectureControl Unit Architecture

Request signals trigger the state transitions. Autonomous state transitions are triggered by signal X

X

AWP

Logic

For static operation

req1reqn

Req_out

reset

OUTIN1

IN2

REG

REG

28

High Level Control: Double-and-AddHigh Level Control: Double-and-Add

1

8

34

6

5

7

Start/LoadX, ResetZ

X=1

LoadY

X=0X=1

If K=0

Shift K

If K=1X=1

ShiftK, Double

K=0,DoubleDone

K=1,DoubleDone/Add

X=1

AddDone

X=1

X=0

X=0

If Stop=1/KP_Done

2

Level-based control

29

Middle Level Control: EC Point DoublingMiddle Level Control: EC Point Doubling

Pulse-based control

0

X=0

1

X=1

2

X=1

3

X=1

4

X=0

5

X=1

X=1X=1

X=1X=1

X=0X=1

6362

6160

5958

StartOPAX OPBZ MULT MD

OPAAShift

OPBAMULT

MD

30

Various States in a Pulsed ControlVarious States in a Pulsed Control

31

ConclusionConclusion

k a b X0 Y0

A B D X Y Z

op

A

op

BDD

OUT

A

Oscillator

Counter

Controller

req1 bit

serial indelay line

serial out

A W P

UM L

req