gmu research 1cryptography.gmu.edu/research/hardware.pdf · ece 545 introduction to vhdl •...
TRANSCRIPT
Hardware Architectures
Secret-keyCryptography
Public-keyCryptography Cryptanalysis
AES& AES candidates
MontgomeryMultipliers
Special-purpose factoring- Elliptic Curve Method& AES candidates
eSTREAMcandidates
HashFunctions SHA-3
Multipliers - Elliptic Curve Method- p-1 method- Rho method- Trial division
ECC cryptosystems
Pairing-basedcryptosystems
SpectralMontgomeryExponentiation
Number Field Sieve- sieving- linear algebra
Hardware Architectures
Secret-keyCryptography
Public-keyCryptography Cryptanalysis
AES& AES candidates
MontgomeryMultipliers
Special-purpose factoring- Elliptic Curve Method& AES candidates
eSTREAMcandidates
HashFunctions SHA-3
Multipliers - Elliptic Curve Method- p-1 method- Rho method- Trial division
ECC cryptosystems
Pairing-basedcryptosystems
SpectralMontgomeryExponentiation
Number Field Sieve- sieving- linear algebra
NSA-developed Cryptographic Standards
DES – Data Encryption Standard
1977 1999
Triple DES
Block Ciphers
Hash Functions 1995 20031993
2005
time
1970 1980 1990 2000 2010
SHA-1–Secure Hash Algorithm
SHA-2
Hash Functions 1995 20031993
SHA-0
Cryptographic Standard Contests
AES
NESSIE
CRYPTREC
15 block ciphers 1 winner
IX.1997 X.2000
I.2000 XII.2002
V.2008XI.2004
time
96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12
eSTREAM
SHA-3
34 stream ciphers 4 SW+3 HW winners
51 hash functions 1 winner
V.2008
X.2007 XII.2012
XI.2004
Criteria used to evaluate cryptographictransformations
Security
SoftwareEfficiency
HardwareEfficiency
Flexibility
Advanced Encryption Standard (AES) Contest1997-2001
15 Candidatesfrom USA, Canada, Belgium,
France, Germany, Norway, UK, Israel,Korea, Japan, Australia, Costa Rica
June 1998
Round 1
SecuritySoftware efficiency
Flexibility
August 1999
October 2000
1 winner: RijndaelBelgium
5 final candidates
Mars, RC6, Rijndael, Serpent, Twofish
Round 2
SecurityHardware efficiency
300350400450500
Speed [Mbit/s]
431 444414
353
294
Worcester Polytechnic Institute
University of Southern California
George Mason University
Implementations of candidates forthe new Advanced Encryption Standard (AES)
Xilinx, Virtex 1000 FPGA
050100150200250300
SerpentI8
Rijndael Twofish RC6 MarsSerpentI1
177173
104
149
62
143112
88102
61
Our Results: Encryption in cipher feedback modes(CBC, CFB, OFB) - Virtex FPGA
Throughput [Mbit/s]
300
400
500
RijndaelSerpent I8
Area [CLB slices]
0
100
200
300
0 1000 2000 3000 4000 5000
Mars
RC6
TwofishSerpent I1
NSA Results: Encryption in cipher feedback modes(CBC, CFB, OFB) - ASIC, 0.5 m CMOS
Throughput [Mbit/s]
400
500
600
700
Rijndael
Area [CLB slices]
0
100
200
300
400
0 5 10 15 20 25 30 35 40
Serpent I1
RC6 TwofishMars
300
350
400
450
500
Speed of the final AES candidates in Xilinx FPGAs
Speed [Mbit/s] K.Gaj, P. Chodowiec, AES3, April, 2000
0
50
100
150
200
250
300
Serpent Rijndael Twofish RC6 Mars
60
70
80
90
100
Survey filled by 167 participants ofthe Third AES Conference, April 2000
# votes
0
10
20
30
40
50
60
SerpentRijndael Twofish RC6 Mars
Security
High
NIST Report: Security
MARSSerpentTwofish
AES Final Report, October 2000
Complexity
Adequate
Simple Complex
Rijndael
Twofish
RC6
eSTREAM Stream Cipher Comparison
• Part of the GMU Fall 2006 & Fall 2007 graduate coursesECE 545 Introduction to VHDL
• Individual 6-week project
• 4 students working independently on each eSTREAM cipher
• best code for each algorithm selected at the endof the semester
• selected designs verified and revised in order to assure• correct functionality• standard interface & control• possibly uniform design & coding style
8000
10000
12000
Throughput[Mbit/s]
Trivium
T64
Best
Comparison of 4 Focus Hardware-Oriented Stream CiphersFPGA: Xilinx Spartan 3 family
0
2000
4000
6000
8000
0 200 400 600 800 1000 1200 1400Area
[CLB slices]
Phelix
T32
T16
Grain
Mickey-128
G16
G1 WorstAES
Comparison of 8 Final Candidates Sorted byMinimum Area and Maximum Throughput/Area
Candidate Area(slices)
Candidate Throughput/Area(Mbps/slices)
Grain v1 44 Trivium (x64) 39.26
Grain 128 50 Grain 128 (x32) 7.97
Trivium 50 Grain v1 (x16) 5.98
DECIM v2 80 Trivium 4.80
DECIM 128 89 F-FCSR-16 4.53
MICKEY 2.0 115 Grain v1 4.45MICKEY 2.0 115 Grain v1 4.45
MICKEY 128 2.0 176 Grain 128 3.92
Moustique 278 F-FCSR-H v2 3.23
F-FCSR-H v2 342 MICKEY 2.0 2.03
Trivium (x64) 344 MICKEY 128 2.0 1.27
Grain v1 (x16) 348 Moustique 0.81
F-FCSR-16 473 DECIM v2 0.58
Grain 128 (x32) 534 DECIM 128 0.49
Pomaranch 648 Edon80 0.10
Edon80 1284 Pomaranch 0.08
Conclusions from the Comparisonof the eSTREAM Candidates in
Hardware
Very large differences among 8 leading candidates:
~30 x in terms of area (Grain v1 vs. Edon80)
~500 x in terms of the throughput to area ratio
(Trivium (x64) vs. Pomaranch)
MD4
MD5 attack with240 operations
Current State of Security of Major Hash Functions
broken;Wang, Feng, Lai, Yu, Crypto 2004(manually, without using a computer)
broken;Wang, Feng,
SHA-0
SHA-1
RIPEMD
RIPEMD-160
SHA-2: SHA-256, SHA-384, SHA-512
broken;Wang, Feng, Lai, YuCrypto 2004(1 hr on a PC)
240 operationsCrypto 2004
Wang, Feng,Lai, Yu,Crypto 2004(manully, withoutusing a computer)
attack with263 operationsWang, Yin,Yu, Aug 2005
SHA-3 Contest Timeline
2007• publication of requirements• 29.X. 2007: request for candidates
2008• 31.X.2008: deadline for submitting candidates
20092 Q – first workshop devoted to the presentation of candidates2 Q – first workshop devoted to the presentation of candidates
20102 Q: second workshop devoted to the analysis of candidates3 Q: selection of finalists
20121 Q: last workshop2 Q: selection of the winner3 Q: draft version of the standard published4 Q: final version of the standard published
H
+
IV
Basic iterative architecture ofa typical hash function
R
+
CLR
WtKtStep t
OUT
H
+
IV
Unrolled architecture
R CLR
. . . .
Wt
Wt+1
Wt+k-1
Kt
Kt+1
Kt+k-1
Step t
Step t+1
Step t+k-1
. . . .
Loop unrolling more suitable forhash algorithms than for symmetric-key ciphers
Speed up compared to the basic iterative architecture:SHA-1: 1.9 (5 rounds unrolled)
Unrolled Architecturesof Hash Functions - Summary
SHA-1: 1.9 (5 rounds unrolled)SHA-256: 1.5 (4 rounds unrolled)SHA-512: 1.3 (5 rounds unrolled)
Speed up is a strong function of data dependenciespresent in the algorithm
Hardware Architectures
Secret-keyCryptography
Public-keyCryptography Cryptanalysis
AES& AES candidates
MontgomeryMultipliers
Special-purpose factoring- Elliptic Curve Method& AES candidates
eSTREAMcandidates
HashFunctions SHA-3
Multipliers - Elliptic Curve Method- p-1 method- Rho method- Trial division
ECC cryptosystems
Pairing-basedcryptosystems
SpectralMontgomeryExponentiation
Number Field Sieve- sieving- linear algebra
Montgomery Multipliers: Motivation
• Fast modular multiplication required inmultiple cryptographic transformations
• RSA, DSA, Diffie-Hellman• Elliptic Curve Cryptosystems• ECM, p-1, Pollard’s rho methods of factoring, etc.
• Montgomery Multiplication invented by Peter L. Montgomery• Montgomery Multiplication invented by Peter L. Montgomeryin 1985 is most frequently used to implement repetitivesequence of modular multiplications in both softwareand hardware
• Montgomery Multiplication in hardware replacesdivision by a sequence of simple logic operations,conditional additions and right shifts
Primary Advantage of our New Architectures
• Reduction in the number of clock cycles
from
2 n + e - 1
ton – size of operands in bits
to
n + e – 1
• Minimum penalty in terms of the area and clockperiod
e – size of operands in words
1.20
1.40
1.60
1.80
2.00
Normalized Product Latency Times AreaNew Architecture vs. Previous Architectures
Our design Tenca & Koc McIvor et al.
1.66
1.14
1.64
1.28
1.63
1.21
1.631.55
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1024Operand size 2048 3072 4096
1.14
Hardware Architectures
Secret-keyCryptography
Public-keyCryptography Cryptanalysis
AES& AES candidates
MontgomeryMultipliers
Special-purpose factoring- Elliptic Curve Method& AES candidates
eSTREAMcandidates
HashFunctions SHA-3
Multipliers - Elliptic Curve Method- p-1 method- Rho method- Trial division
ECC cryptosystems
Pairing-basedcryptosystems
SpectralMontgomeryExponentiation
Number Field Sieve- sieving- linear algebra
Pairing Based Cryptography
• New family of public key cryptosystems, first proposed byMenezes, Okamoto, and Vanstone in 1993
• Application to:
– Identity Based Cryptography
– One-round 3-way key exchange
– Short digital signatures– Short digital signatures
– Others: Group signatures, batch signatures, thresholdcryptography, broadcast encryption, private informationretrieval, electronic voting, etc.
• Not a part of any standard yet
• Very limited number of software and hardwareimplementations
Spectral Modular Exponentiation
• New method for fast modular exponentiation
for very long integers in the range of 10,000-20,000 bits
• First publication in 2007, by Koc and Saldamli
• Intersection of cryptography and Digital Signal Processing• Intersection of cryptography and Digital Signal Processing
• Better computational complexity than any other
algorithm known to date
• No reported software or hardware implementations
Hardware Architectures
Secret-keyCryptography
Public-keyCryptography Cryptanalysis
AES& AES candidates
MontgomeryMultipliers
Special-purpose factoring- Elliptic Curve Method& AES candidates
eSTREAMcandidates
HashFunctions SHA-3
Multipliers - Elliptic Curve Method- p-1 method- Rho method- Trial division
ECC cryptosystems
Pairing-basedcryptosystems
SpectralMontgomeryExponentiation
Number Field Sieve- sieving- linear algebra
SHARCS - Special-purpose Hardwarefor Attacking Cryptographic Systems
1st edition: Paris, Feb. 24-25, 20052nd edition: Cologne, Apr. 3-4, 2006
Workshop Series
3rd edition: Vienna, Sep. 9-10, 2007
Seehttp://www.ruhr-uni-bochum.de/itsc/tanja/SHARCS/
Best Algorithm to Factor Large Numbers
NUMBER FIELD SIEVE
Complexity: Sub-exponential time and memory
N = Number to factor,k = Number of bits of N
Exponential function, ekExecutiontime
Polynomialfunction, a·km
Exponential function, ek
Sub-exponential function,
e k1/3
(ln k)2/3
k = Number of bits of N
time
Factoring 1024-bit RSA keysusing Number Field Sieve (NFS)
Polynomial Selection
Relation Collection
Sieving
Minifactoring200 bit & 350 bit ECM SHARCS 2006
Linear Algebra
Square Root
Sieving200 bit
numbers
& 350 bit ECMp-1 methodPollard rhoTrial division
SHARCS 2005
SHARCS 2006
IPAM 2006
SHARCS 2007
Comparison among technologies
Microprocessors ASICsFPGAs
SRC COPACOBANA
SRC 6reconfigurable computer
2 x Pentium Xeon 3 GHz
2 x Xilinx Virtex II FPGAXC2V6000 running at 100 MHz
Basic unit:
SRC 6 fromSRC Computers
Fast communication interfacebetween the microprocessor boardand the FPGA board, 1600 MB/s
Multiple basic units can be connectedusing Hi-Bar Switch andGlobal Common Memory
Factoring Runs per Second
Spartan3s5000
Virtex2v6000
Pentium4 2.8GHz
637
869
635
857
11.3x
8.4x
10.8x
7.9x
637 635
315
435
80 7640
rho p-1 ECM
10.8x
7.8x
ASIC 130 nm vs. Virtex II 6000 – rho (24 units)
19.80 mm
Area of Virtex II 6000
51x
19.6
8m
m
2.7 mm
2.82 mm
Area of Virtex II 6000(estimation by R.J. Lim Fong,
MS Thesis, VPI, 2004)
Area of an ASIC with equivalent functionality
Number of rho & ECM computations persecond using the same chip area
88,405
101x
50x
Virtex2v6000
130 nm ASIC library
869
21,739
435
50x
rho ECM