comparison of hardware performance of selected phase ii estream

28
Comparison of Hardware Performance of Selected Phase II eSTREAM Candidates Kris Gaj Gabriel Southern Ramakrishna Bachimanchi & Fall 2006 GMU ECE 545: Introduction to VHDL class George Mason University

Upload: others

Post on 03-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Comparison of Hardware Performanceof Selected Phase II

eSTREAM Candidates

Kris Gaj

Gabriel Southern

Ramakrishna Bachimanchi

&

Fall 2006 GMU ECE 545: Introduction to VHDL class

George Mason University

Goal

Comparison of Profile II (hardware) Phase 2 Focus candidates:

• Grain• Mickey-128• Phelix• Trivium

Two additional reference points:

• A5/1 (old & insecure GSM standard)• AES (compact architecture)

Two hardware technologies:

• Xilinx Spartan 3 FPGAs• TSMC 90 nm standard-cell library ASICs

Genesis & approach

• Part of GMU Fall 2006 graduate courseECE 545 Introduction to VHDL

• Individual 6-week project

• 4 students working independently on each eSTREAM cipher

• best code for each algorithm selected at the endof the semester

• selected designs verified and revised in order to assure• correct functionality• standard interface & control• uniform design & coding style

eSTREAMcipher

clk

reset

enc_dec

data_in

data_in_ready

data_in_write

d

data_out

writefull

d

Fixed interface

key_IV

key_IV_ready

key_IV_write

k

Two independent parameters

d – number of bits processed per clock cycle (radix)

k – number of bits of key/IV loaded per clock cycle

d encryption/decryptionthroughput

area

# of pins

+

ksetup time(key & IV loading+ initialization) area

# of pins

All results generated with k = d

Methodology

Specification

Execution Unit Control Unit

Block

diagram

Algorithmic

State Machine

VHDL code VHDL code

Methodology & tools

Xilinx ISE v. 8.1iImplementation(mapping, placing

& routing)

SynopsysDesign Analyzer

X-2005.9

SynplicitySynplify Pro

v. 8.5

Logic synthesis

Aldec Active HDLModelSim Xilinx Edition

VHDL simulation& debugging

ASICFPGATechnology

All results afterplacing & routing

All results afterlogic synthesis

No physicalimplementation

Assumptions

• Only encryption/decryption, no MAC

• Maximum allowed key and IV sizes

• Key and IV need to be reloaded each time eitherof them changes

• No precomputations of internal state outsideof the circuit.

642264A5/1

2888080Trivium

288128256Phelix

320128128Mickey-128

1606480Grain

Internal state sizeIV sizeKey sizeCipher22

Choice of hardwarearchitecture

Three categories of stream ciphersrepresented among those implemented

Based onLFSRs / NFSRswith serial inputs

Based onLFSRs / NFSRswith parallel inputs

NFSR LFSR NFSR LFSR

Based on basic iterativearchitecture and componentoperations of block ciphersand hash functions

Phelix, AES in OFB or CTR mode

Grain, Trivium, A5/1 Mickey-128

Optimizations for the first group of ciphersGrain

d=1 d=2

Si+80=si+62 + si+52 + si+38 + si+23 + si+13 + si

Si+81=si+63 + si+53 + si+39 + si+24 + si+14 + si+1

si si+79

si+80 si+80

si+81

sisi+1 si+79

HB

HB

HB

HB

Key mixing Encryption

HB HB

Key mixing Encryption

HB

HB

Key mixing Encryption

HB

Key mixing Encryption

2 x number of clock cycles

Block function,Sharing

Block function,No sharing

Half-Blockfunction,Sharing

Half-Block function,No sharing

Optimizations for the third group of ciphersPhelix

Results

Ease of design as perceivedby students

based on the specification of each cipher

Average score( 5 – very easy,1 – very difficult)

Number of studentswho selected the

cipher as their first choice

Trivium

Mickey-128

Grain

Phelix

3.36

3.32

3.00

2.00

5

3

4

0

0

2000

4000

6000

8000

10000

12000

0 200 400 600 800 1000 1200 1400 Area[CLB slices]

Throughput[Mbit/s]

Phelix

Trivium

T64

T32

T16

Grain

Mickey-128

G16

G1

Best

Worst

Throughput vs. areaFPGA: Xilinx Spartan 3 family

AES

0

200

400

600

800

1000

1200

1200 1250 1300 1350 1400 Area[CLB slices]

Throughput[Mbit/s]

Block function,No sharing

Block function,Sharing

Half-Block function,Sharing

Half-Block function,No sharing

Throughput vs. area: PhelixFPGA: Xilinx Spartan 3 family

0

2000

4000

6000

8000

10000

12000

0 50 100 150 200 250 300 350 400Area

[CLB slices]

Throughput[Mbit/s]

G1 G2 G4

G8

G16

M

T64

T32

T16

T8

T1-4

A – A5/1G – GrainM – Mickey-128T – Trivium

Legend:

Throughput vs. area: Grain, Mickey-128, Trivium, A5/1FPGA: Xilinx Spartan 3 family

0

500

1000

1500

2000

2500

3000

0 50 100 150 200 250 300 350Area

[CLB slices]

Throughput[Mbit/s]

G16

M

T16

G8

T8

T4

T2

T1

G4

G2G1A1

A3 A4

A – A5/1G – GrainM – Mickey-128T – Trivium

Legend:

Throughput vs. area: Throughput up to 3 Gbit/sFPGA: Xilinx Spartan 3 family

0

100

200

300

400

500

600

700

800

0 200 400 600 800 1000 1200Area

[CLB slices]

Throughput[Mbit/s]

Mickey-128

Trivium-1

Grain-1A5/1-1

Phelix (Half-block, No sharing)

Optimizations for minimum areaFPGA: Xilinx Spartan 3 family

0

200

400

600

800

1000

1200

Area[CLB slices]

Mickey-128(d=1)

Trivium(d=1)

Grain(d=1)

A5/1(d=1)

Phelix(d=32, HB-NS)

57122

188230

1216

x 0.47x 1.54

x 1.00

x 1.89

x 9.97

Optimizations for minimum areaFPGA: Xilinx Spartan 3 family

x cipher area/Grain area

0

2000

4000

6000

8000

10000

12000

0 200 400 600 800 1000 1200Area

[CLB slices]

Throughput[Mbit/s]

Mickey-128

Trivium-64

Grain-16

A5/1-1

Phelix (Full block, Sharing)

Optimizations for maximum throughput to area ratioFPGA: Xilinx Spartan 3 family

0.00

5.00

10.00

15.00

20.00

25.00

30.00

Mickey-128(d=1)

Trivium(d=64)

Grain(d=16)

A5/1(d=1)

Phelix(d=32, FB-SH)

Throughput/Area[Mbit/sCLB slices]

31.34

6.97

3.050.96 0.83

x 4.5

x 10.3x 32.5 x 37.7

x 1.0

x Trivium ratio / cipher ratio

Optimizations for maximum throughput to area ratioFPGA: Xilinx Spartan 3 family

0

1000

2000

3000

4000

5000

6000

Mickey-128Trivium Grain A5/1 Phelix

k= 1 2 4 8 16 32 64 1 2 4 8 16 1 3 4 32

Setup Time = Key & IV Loading + Initialization TimeFPGA: Xilinx Spartan 3 family

Setup time[ns]

0

200

400

600

800

1000

1200

2000

4000

6000

8000

10000

12000

Clock cycles Nanoseconds

Mickey-128(k=1)

Trivium(k=1)

Grain(k=1)

A5/1(k=1)

Phelix(k=32)

0

Setup Time = Key & IV Loading + Initialization TimeFPGA: Xilinx Spartan 3 family

0

50

100

150

200

250

300

350

400

Mickey-128(k=1)

Trivium(k=64)

Grain(k=16)

A5/1(k=1)

Phelix(k=32)

0

500

1000

1500

2000

2500

3000

3500

4000

Setup Time = Key & IV Loading + Initialization TimeFPGA: Xilinx Spartan 3 family

Clock cycles Nanoseconds

Results forASICs: TSMC 90 nm library

Relative results comparable to results for FPGAs

Absolute speed increase by a factor from 3 to 10before ASIC layout synthesis

Conclusions• Very large differences among candidate ciphers

(much larger than for five final candidates in the AES contest)

Possible reasons:• variety of ciphers based on different design principles• different internal state, key, and IV sizes• early stage of the contest

Trivium and Grain outperform other eSTREAM ciphersin terms of

• flexibility• minimum area• maximum throughput to area ratio.

Once again ciphers based on LFSR and NFSRs show theirsuperiority in hardware implementations

Security analysis should focus first on the most efficient ciphers

Questions???

Thank you!