Transcript
Page 1: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 1/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

Superscalar Coprocessor forHigh-speed Curve-based

Cryptography

K. Sakiyama, L. Batina, B. Preneel, I. Verbauwhede

Katholieke Universiteit Leuven / IBBTDepartment Electrical Engineering - ESAT/COSIC

Page 2: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 2/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

Introduction

Curve-based Cryptography

HW/SW Partitioning

Superscalar Coprocessor

Results

Conclusions

Overview

Page 3: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 3/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

IntroductionMotivation

High-speed curve-based cryptography in HW/SW co-design How much instruction-level parallelism can we obtain from coprocessor instructi

ons?

Performance improvement for different operation forms in datapath AB+C mod P vs A(B+D)+C mod P ,A,B,C,D,P: polynomials

Performance comparison three different curve-based cryptosystems Which one is faster between ECC, HECC, ECC over a composite field?

Programmability and scalability Programmable in order to support different cryptosystems? Scalable in field sizes?

Page 4: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 4/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

IntroductionTarget Architecture

Curve-based cryptography over binary fields Hardware can be smaller and faster than prime field ECC over a binary field, e.g. GF(2163) HECC of genus 2 Field length can be shorter with a factor of 2, e.g. GF(283) ECC over a composite field Field length can be shorter with a factor of 2, e.g. GF ((283)2)

The datapath can be sharedProgrammable coprocessor supporting three curve-based crypt

ography by defining coprocessor instruction(s)(Coprocessor) instruction-level parallelism by superscalar

Page 5: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 5/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

Introduction

Curve-based Cryptography

HW/SW Partitioning

Superscalar Coprocessor

Results

Conclusions

Overview

Page 6: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 6/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

Curve-based Cryptography

HW/SW partitioning (1) General hierarchy in coprocessor for curve-

based cryptography

Point/DivisorMultiplication

Point/DivisorAddition

Point/DivisorDoubling

Finite FieldAddition

Finite FieldMultiplication

Finite FieldInversion HW Datapath

SW or HW controller

SW or HW controller

Page 7: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 7/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

Single instruction for all finite field operations Fixed-cycle execution enables efficient

implementation

Point/DivisorMultiplication

Point/DivisorAddition

Point/DivisorDoubling

Finite FieldAddition

Finite FieldMultiplication

Finite FieldInversion

Point/DivisorMultiplication

Point/DivisorAddition

Point/DivisorDoubling

Finite Field OperationE.g. AB+C mod P

Finite FieldInversion

Curve-based Cryptography

Proposed Hierarchy (1)

Sing

le In

stru

ctio

n

(Dat

apat

h)

Conv

ention

al

Page 8: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 8/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

(a) Building block: Regular XOR chains (b) Scalable in digit size (d) and field size (k) by

interconnecting several building blocks We use MALU83 (n=83, d=12) as building block

2xMALU83 can be configured as 1xMALU163

Curve-based Cryptography

Modular Arithmetic Logic Unit (MALU)

aiB(x)

miP(x)

T(x)

c i

ak

mk

ck+1

Tnext(x)

aiB(x)

miP(x)

T(x)

c i

ak

mk

ck+1

Tnext(x)

Inte

rco

nnec

tion

Inte

rco

nnec

tion

… …

(b)(a)

d

n

Page 9: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 9/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

Introduction

Curve-based Cryptography

HW/SW Partitioning

Superscalar Coprocessor

Results

Conclusions

Overview

Page 10: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 10/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

HW/SW PartitioningTYPE I: Smallest implementation

(baseline)

32-bitinstructions32-bit data

Instruction Bus

ProgramROM

Main CPU

Memory Mapped I/O

SRAM

MALU83

Data Bus

DBC

Coprocessor

IBC

Page 11: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 11/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

HW/SW Partitioning TYPE II: TYPE I + -code RAM

IBC

32-bitinstructions32-bit data

Instruction Bus

ProgramROM

Main CPU

Memory Mapped I/O

SRAM

-codeRAM

Data Bus

DBC

FSM

Coprocessor

MALU83

Page 12: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 12/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

HW/SW Partitioning TYPE III: TYPE I + Coprocessor

Memory

32-bitinstructions32-bit data

Instruction Bus

ProgramROM

Main CPU

Memory Mapped I/O

Coprocessor Memory

SRAM

MALU83

Data Bus

DBC

Coprocessor

IBC

Page 13: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 13/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

HW/SW Partitioning TYPE IV: TYPE I + Copro. Mem.& -code RAM

32-bitinstructions32-bit data

Instruction Bus

ProgramROM

Main CPU

Memory Mapped I/O

Coprocessor Memory

SRAM

MALU83

Data Bus

DBC

IBC-codeRAM

FSM

Coprocessor

Page 14: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 14/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

HW/SW Partitioning Co-design flow with GEZEL

Partitioning of functions

C/C++ codes for PKCs

C/C++ codes & H/W behavior blocks w/interface

GEZELFDL codes

Cross compile Synthesis

C/C++ codes w/physicalmemory map

ARM (SW) Co-processor (HW)

Cycle-true sim.( GEZEL)

VHDL codesProgram codes

Page 15: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 15/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

HW/SW Partitioning Result: Vertical Exploration of

System HECC Performance for different HW/SW

partitioning (Performance: Point/Divisor multiplication)

38 38

676767670 0

187

2,859

0

2,672

0

100

200

300

400

500

TYPE I TYPE II TYPE III TYPE IV

System Configuration

Req

uire

d C

lock

Cyc

les

[K]

I/O Transfer Overhead + OthersCoprocessor Data MemoryDatapath

Coprocessor Configuration

-code RAM Data Mem.

TYPE ITYPE II XTEPE III XTYPE IV X X

Page 16: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 16/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

Introduction

Curve-based Cryptography

HW/SW Partitioning

Superscalar Coprocessor

Results

Conclusions

Overview

Page 17: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 17/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

Multiple Modular Arithmetic Logic Units (MALUs) in coprocessor

Finite FieldOperation

E.g. AB+C mod P

Point/DivisorMultiplication

Point/DivisorAddition

Point/DivisorDoubling

Finite FieldInversion

Finite FieldOperation

E.g. AB+C mod P

Finite FieldOperation

E.g. AB+C mod P

Finite FieldOperation

E.g. AB+C mod P

Multipl

e MAL

Us

Point/DivisorMultiplication

Point/DivisorAddition

Point/DivisorDoubling

Finite Field OperationE.g. AB+C mod P

Finite FieldInversion

Sing

le M

ALU

Superscalar Coprocessor Proposed Hierarchy (2)

Page 18: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 18/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

IBC

32-bitinstructions32-bit data

Instruction Bus

ProgramROM

Main CPU

Memory Mapped I/O

MALU83

Coprocessor Memory

SRAM

MALU83 MALU83 MALU83

IQB

-codeRAM

Data Bus

BufferFull

DBC

FSM

Coprocessor

Superscalar Coprocessor Parallel Processing Architecture (TYPE IV-

based)

Page 19: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 19/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

Superscalar Coprocessor

Horizontal Exploration of System Performance of ECC and HECC

67 58

30 3622 20 20

3841

25 1322 22

8

0

20

40

60

80

100

Coprocessor Configuration

Req

uire

d C

lock

Cyc

les

[K]

Coprocessor Data Memory

Datapath

1xMALU83 2xMALU831xMALU83

HECC HECCHECC

Operation: A(B+D)+COperation: AB+C1xMALU163 2xMALU1633xMALU83 4xMALU83

HECCECC HECC ECC

Page 20: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 20/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

Introduction

Curve-based Cryptography

HW/SW Partitioning

Superscalar Coprocessor

Results

Conclusions

Overview

Page 21: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 21/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

ResultsPerformance for ECC over GF(283)

Fastest of three

x1.8 speed-up by 2-way superscaling (ILPD

P=6) with A(B+D)+C

Still more improvement is possible by adding MALUs

AB+C A(B+D)+C

Page 22: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 22/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

ResultsPerformance of HECC over GF(283)

Faster than ECC over a composite field

x2.7 speed-up by 4-way superscaling (ILPDP=5) with A(B+D)+C

Less improvement as increasing # of MALU

AB+C A(B+D)+C

Page 23: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 23/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

ResultsPerformance for ECC over GF((283)2 )

Slowest of three

x2.5 speed-up by 4-way superscaling (ILPD

P=6) with A(B+D)+C

Less improvement as increasing # of MALU

AB+C A(B+D)+C

Page 24: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 24/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

ResultsComparison of ECC/HECC implementations on

FPGAs

[11] T. Wollinger, PhD thesis, 2004.[13] G. Orlando and C. Paar, CHES 00.[14] N. Gura et al., CHES02.[29] Nazar A. Saqib et al., International Journal of Embedded Systems 2005

Page 25: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 25/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

Performance improvement / Comparison ECC was improved by a factor of 1.8 (2-way) HECC (genus 2) was improved by a factor of 2.7 (4-way) ECC over a composite field was improved by a factor of 2.5 (4-way) A(B+D)+C offers better performance than AB+C ECC is the fastest in this case study

Programmability & flexibility Support three different curve-based cryptosystems over a binary field Arbitrary irreducible polynomial Field size up to 332 bits   by using 4xMALU83

Conclusions

Page 26: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 26/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

Thank you!

Page 27: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 27/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

EXIF/DMALU#0

1 4(3*) 4 Clock cycle

EXIF/D

EXIF/D

EXIF/D

R0 W0 IF/D

IF/D

MALU#3

MALU#1

MALU#2

k/d

R1

R2

R3

R0

R1

R2

R3

R0

R1

R2

R3

IF/D

W3IF/D

R0

R1

R2

R3

R0

R1

R2

R3

R0

R1

R2

R3

W1

W2

R0

R1

R2

R3

R0

R1

R2

R3

Parallel issue of instructionsCase of using 4 MALUs

IF/D : Instruction Fetch & Decode R_ : Read operands (dependent on the type of

operation) EX : Execution (dependent on MALU configuration, k &

d) W_ : Write (dependent on # of instructions issued in

parallel)

Page 28: Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K

13/10/2006 28/26Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006)

Parallel issue of instructions

Out-of-order Execution Check RAW (Read After Write Dependency) for in-/out-of-order execution


Top Related