gmu research 1cryptography.gmu.edu/research/hardware.pdf · ece 545 introduction to vhdl •...

Hardware Architectures

Secret-keyCryptography

Public-keyCryptography Cryptanalysis

AES& AES candidates

MontgomeryMultipliers

Special-purpose factoring- Elliptic Curve Method& AES candidates

eSTREAMcandidates

HashFunctions SHA-3

Multipliers - Elliptic Curve Method- p-1 method- Rho method- Trial division

ECC cryptosystems

Pairing-basedcryptosystems

SpectralMontgomeryExponentiation

Number Field Sieve- sieving- linear algebra

NSA-developed Cryptographic Standards

DES – Data Encryption Standard

1977 1999

Triple DES

Block Ciphers

Hash Functions 1995 20031993

2005

time

1970 1980 1990 2000 2010

SHA-1–Secure Hash Algorithm

SHA-2

Hash Functions 1995 20031993

SHA-0

Cryptographic Standard Contests

AES

NESSIE

CRYPTREC

15 block ciphers 1 winner

IX.1997 X.2000

I.2000 XII.2002

V.2008XI.2004

time

96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12

eSTREAM

SHA-3

34 stream ciphers 4 SW+3 HW winners

51 hash functions 1 winner

V.2008

X.2007 XII.2012

XI.2004

Criteria used to evaluate cryptographictransformations

Security

SoftwareEfficiency

HardwareEfficiency

Flexibility

Advanced Encryption Standard (AES) Contest1997-2001

15 Candidatesfrom USA, Canada, Belgium,

France, Germany, Norway, UK, Israel,Korea, Japan, Australia, Costa Rica

June 1998

Round 1

SecuritySoftware efficiency

Flexibility

August 1999

October 2000

1 winner: RijndaelBelgium

5 final candidates

Mars, RC6, Rijndael, Serpent, Twofish

Round 2

SecurityHardware efficiency

300350400450500

Speed [Mbit/s]

431 444414

353

294

Worcester Polytechnic Institute

University of Southern California

George Mason University

Implementations of candidates forthe new Advanced Encryption Standard (AES)

Xilinx, Virtex 1000 FPGA

050100150200250300

SerpentI8

Rijndael Twofish RC6 MarsSerpentI1

177173

104

149

62

143112

88102

61

Our Results: Encryption in cipher feedback modes(CBC, CFB, OFB) - Virtex FPGA

Throughput [Mbit/s]

300

400

500

RijndaelSerpent I8

Area [CLB slices]

0

100

200

300

0 1000 2000 3000 4000 5000

Mars

RC6

TwofishSerpent I1

NSA Results: Encryption in cipher feedback modes(CBC, CFB, OFB) - ASIC, 0.5 m CMOS

Throughput [Mbit/s]

400

500

600

700

Rijndael

Area [CLB slices]

0

100

200

300

400

0 5 10 15 20 25 30 35 40

Serpent I1

RC6 TwofishMars

300

350

400

450

500

Speed of the final AES candidates in Xilinx FPGAs

Speed [Mbit/s] K.Gaj, P. Chodowiec, AES3, April, 2000

0

50

100

150

200

250

300

Serpent Rijndael Twofish RC6 Mars

60

70

80

90

100

Survey filled by 167 participants ofthe Third AES Conference, April 2000

# votes

0

10

20

30

40

50

60

SerpentRijndael Twofish RC6 Mars

Security

High

NIST Report: Security

MARSSerpentTwofish

AES Final Report, October 2000

Complexity

Adequate

Simple Complex

Rijndael

Twofish

RC6

eSTREAM Stream Cipher Comparison

• Part of the GMU Fall 2006 & Fall 2007 graduate coursesECE 545 Introduction to VHDL

• Individual 6-week project

• 4 students working independently on each eSTREAM cipher

• best code for each algorithm selected at the endof the semester

• selected designs verified and revised in order to assure• correct functionality• standard interface & control• possibly uniform design & coding style

8000

10000

12000

Throughput[Mbit/s]

Trivium

T64

Best

Comparison of 4 Focus Hardware-Oriented Stream CiphersFPGA: Xilinx Spartan 3 family

0

2000

4000

6000

8000

0 200 400 600 800 1000 1200 1400Area

[CLB slices]

Phelix

T32

T16

Grain

Mickey-128

G16

G1 WorstAES

Comparison of 8 Final Candidates Sorted byMinimum Area and Maximum Throughput/Area

Candidate Area(slices)

Candidate Throughput/Area(Mbps/slices)

Grain v1 44 Trivium (x64) 39.26

Grain 128 50 Grain 128 (x32) 7.97

Trivium 50 Grain v1 (x16) 5.98

DECIM v2 80 Trivium 4.80

DECIM 128 89 F-FCSR-16 4.53

MICKEY 2.0 115 Grain v1 4.45MICKEY 2.0 115 Grain v1 4.45

MICKEY 128 2.0 176 Grain 128 3.92

Moustique 278 F-FCSR-H v2 3.23

F-FCSR-H v2 342 MICKEY 2.0 2.03

Trivium (x64) 344 MICKEY 128 2.0 1.27

Grain v1 (x16) 348 Moustique 0.81

F-FCSR-16 473 DECIM v2 0.58

Grain 128 (x32) 534 DECIM 128 0.49

Pomaranch 648 Edon80 0.10

Edon80 1284 Pomaranch 0.08

Conclusions from the Comparisonof the eSTREAM Candidates in

Hardware

Very large differences among 8 leading candidates:

~30 x in terms of area (Grain v1 vs. Edon80)

~500 x in terms of the throughput to area ratio

(Trivium (x64) vs. Pomaranch)

MD4

MD5 attack with240 operations

Current State of Security of Major Hash Functions

broken;Wang, Feng, Lai, Yu, Crypto 2004(manually, without using a computer)

broken;Wang, Feng,

SHA-0

SHA-1

RIPEMD

RIPEMD-160

SHA-2: SHA-256, SHA-384, SHA-512

broken;Wang, Feng, Lai, YuCrypto 2004(1 hr on a PC)

240 operationsCrypto 2004

Wang, Feng,Lai, Yu,Crypto 2004(manully, withoutusing a computer)

attack with263 operationsWang, Yin,Yu, Aug 2005

SHA-3 Contest Timeline

2007• publication of requirements• 29.X. 2007: request for candidates

2008• 31.X.2008: deadline for submitting candidates

20092 Q – first workshop devoted to the presentation of candidates2 Q – first workshop devoted to the presentation of candidates

20102 Q: second workshop devoted to the analysis of candidates3 Q: selection of finalists

20121 Q: last workshop2 Q: selection of the winner3 Q: draft version of the standard published4 Q: final version of the standard published

H

+

IV

Basic iterative architecture ofa typical hash function

R

+

CLR

WtKtStep t

OUT

H

+

IV

Unrolled architecture

R CLR

. . . .

Wt

Wt+1

Wt+k-1

Kt

Kt+1

Kt+k-1

Step t

Step t+1

Step t+k-1

. . . .

Loop unrolling more suitable forhash algorithms than for symmetric-key ciphers

Speed up compared to the basic iterative architecture:SHA-1: 1.9 (5 rounds unrolled)

Unrolled Architecturesof Hash Functions - Summary

SHA-1: 1.9 (5 rounds unrolled)SHA-256: 1.5 (4 rounds unrolled)SHA-512: 1.3 (5 rounds unrolled)

Speed up is a strong function of data dependenciespresent in the algorithm




AES& AES candidates



eSTREAMcandidates

HashFunctions SHA-3


ECC cryptosystems




Montgomery Multipliers: Motivation

• Fast modular multiplication required inmultiple cryptographic transformations

• RSA, DSA, Diffie-Hellman• Elliptic Curve Cryptosystems• ECM, p-1, Pollard’s rho methods of factoring, etc.

• Montgomery Multiplication invented by Peter L. Montgomery• Montgomery Multiplication invented by Peter L. Montgomeryin 1985 is most frequently used to implement repetitivesequence of modular multiplications in both softwareand hardware

• Montgomery Multiplication in hardware replacesdivision by a sequence of simple logic operations,conditional additions and right shifts

Primary Advantage of our New Architectures

• Reduction in the number of clock cycles

from

2 n + e - 1

ton – size of operands in bits

to

n + e – 1

• Minimum penalty in terms of the area and clockperiod

e – size of operands in words

1.20

1.40

1.60

1.80

2.00

Normalized Product Latency Times AreaNew Architecture vs. Previous Architectures

Our design Tenca & Koc McIvor et al.

1.66

1.14

1.64

1.28

1.63

1.21

1.631.55

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1024Operand size 2048 3072 4096

1.14




AES& AES candidates



eSTREAMcandidates

HashFunctions SHA-3


ECC cryptosystems




Pairing Based Cryptography

• New family of public key cryptosystems, first proposed byMenezes, Okamoto, and Vanstone in 1993

• Application to:

– Identity Based Cryptography

– One-round 3-way key exchange

– Short digital signatures– Short digital signatures

– Others: Group signatures, batch signatures, thresholdcryptography, broadcast encryption, private informationretrieval, electronic voting, etc.

• Not a part of any standard yet

• Very limited number of software and hardwareimplementations

Spectral Modular Exponentiation

• New method for fast modular exponentiation

for very long integers in the range of 10,000-20,000 bits

• First publication in 2007, by Koc and Saldamli

• Intersection of cryptography and Digital Signal Processing• Intersection of cryptography and Digital Signal Processing

• Better computational complexity than any other

algorithm known to date

• No reported software or hardware implementations




AES& AES candidates



eSTREAMcandidates

HashFunctions SHA-3


ECC cryptosystems




SHARCS - Special-purpose Hardwarefor Attacking Cryptographic Systems

1st edition: Paris, Feb. 24-25, 20052nd edition: Cologne, Apr. 3-4, 2006

Workshop Series

3rd edition: Vienna, Sep. 9-10, 2007

Seehttp://www.ruhr-uni-bochum.de/itsc/tanja/SHARCS/

Best Algorithm to Factor Large Numbers

NUMBER FIELD SIEVE

Complexity: Sub-exponential time and memory

N = Number to factor,k = Number of bits of N

Exponential function, ekExecutiontime

Polynomialfunction, a·km

Exponential function, ek

Sub-exponential function,

e k1/3

(ln k)2/3

k = Number of bits of N

time

Factoring 1024-bit RSA keysusing Number Field Sieve (NFS)

Polynomial Selection

Relation Collection

Sieving

Minifactoring200 bit & 350 bit ECM SHARCS 2006

Linear Algebra

Square Root

Sieving200 bit

numbers

& 350 bit ECMp-1 methodPollard rhoTrial division

SHARCS 2005

SHARCS 2006

IPAM 2006

SHARCS 2007

Comparison among technologies

Microprocessors ASICsFPGAs

SRC COPACOBANA

SRC 6reconfigurable computer

2 x Pentium Xeon 3 GHz

2 x Xilinx Virtex II FPGAXC2V6000 running at 100 MHz

Basic unit:

SRC 6 fromSRC Computers

Fast communication interfacebetween the microprocessor boardand the FPGA board, 1600 MB/s

Multiple basic units can be connectedusing Hi-Bar Switch andGlobal Common Memory

Factoring Runs per Second

Spartan3s5000

Virtex2v6000

Pentium4 2.8GHz

637

869

635

857

11.3x

8.4x

10.8x

7.9x

637 635

315

435

80 7640

rho p-1 ECM

10.8x

7.8x

ASIC 130 nm vs. Virtex II 6000 – rho (24 units)

19.80 mm

Area of Virtex II 6000

51x

19.6

8m

m

2.7 mm

2.82 mm

Area of Virtex II 6000(estimation by R.J. Lim Fong,

MS Thesis, VPI, 2004)

Area of an ASIC with equivalent functionality

Number of rho & ECM computations persecond using the same chip area

88,405

101x

50x

Virtex2v6000

130 nm ASIC library

869

21,739

435

50x

rho ECM

gmu research 1cryptography.gmu.edu/research/hardware.pdf · ece 545 introduction to vhdl •...

Documents