simd instruction set extensions for keccak with ... · analysis of the instruction set design space...

26
SIMD Instruction Set Extensions for KECCAK with Applications to SHA-3, Keyak and Ketje Hemendra K. Rawat and Patrick Schaumont Virginia tech, Blacksburg, USA {hrawat, schaum}@vt.edu 1

Upload: others

Post on 25-Sep-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

SIMD Instruction Set Extensions for KECCAK with Applications to SHA-3, Keyak and Ketje!

Hemendra K. Rawat and Patrick Schaumont! Virginia tech, Blacksburg, USA!

{hrawat, schaum}@vt.edu!

1

Page 2: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Motivation

2 “A flexible SIMD Instruction set, for flexible KECCAK Sponge” !

q  Secure systems and protocols rely on a suite of cryptographic applications §  Hashing, MAC, Encryption, PRNG, AEAD etc...

q  Traditionally, different algorithms: SHA-1, SHA-2, AES, GMAC, HMAC q  Different Instruction set extensions: AES-NI, SHA and Carry-less multiplication

§  Vector extensions in Intel, ARM, SPARC and PowerPC

Q  KECCAK is the SHA-3 §  Successor of SHA-2

q  Subset of the cryptographic primitive family KECCAK SPONGE q  Not just a Hash q  A Flexible Sponge for all symmetric cryptography

Page 3: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Outline

Q  KECCAK Background q  Novelty Claims q  Design Principles q  Proposed Instruction Set Extensions q  Results

q  Performance q  Hardware Overhead

q  Conclusion

3

Page 4: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

KECCAK HASH (SHA-3)

f f f f f f

Absorbing Squeezing

Variable length input Output Digest

Padding

State

…rate (r bits)

capacity (c bits)

0

0

Hashing using KECCAK Sponge Construction

Cryptographic Permutation

r bits

4

Page 5: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

KECCAK-f[b] Permutation

f

state [0 : b] def keccakf[b]: for numRounds in range(0, maxRounds):

theta(state) # Add column parities rho(state) # Rotate lanes pi(state) # Transpose lanes chi(state) # Add non-linearity iota(state) # L00 ⊕ Kround

0 0

b = 1600, 800, 400, 200, 100, 50, 25 bits

b = (r +c ) bits

5

L00 L10

L01

L20 L30 L40

L11 L21 L31 L41

L02 L12 L22 L32 L42

L03 L13 L23 L33 L43

L04 L14 L24 L34 L44

x

y

x=0 x=1 x=2 x=3 x=4

y=4

y=3

y=2

y=1

y=0

x

zsliceplane

lane

State

KECCAK-f in Nutshell

Page 6: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

KECCAK Applications

0 f f f

Message Digest

0 f f f

Key MACMessage

0 f f f

Key IV Keystream

0 f f f

Key MACMessage

Keystream

(a)

(b)

(c)

(d)

rate  r

capacity  c

Constructions with sponge: (a) Hash, (b) Message Authentication Code, (c) Keystream Generation. Construction with duplex: (d) Authenticated Encryption

6

Page 7: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Application Stack

Sponge Duplex

Hash MAC PRNG AEAD

KECCAK-p[800,12]

KECCAK-f[400]

KECCAK-f[200]

KECCAK-f[1600]

Application Layer

Cryptographic Construction

API

Primitives

rl1xkxorr64chi1chi2

rl1xxorrchi1chi3

rl1xxorrchi1chi3

rl1xxorrchi1chi3

CustomInstructions

7

Page 8: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Novelty Claims

Previous work q Custom-instruction designs for a 16 bit micro-controller for SHA-3 finalists,

Constantin et al. q  Integration of 64 bit KECCAK(SHA-3) data path into a 32 bit LEON3, Wang et al. This work q Six new custom instructions for 128 bit SIMD unit. q Multipurpose KECCAK:1600, 800, 400 and 200 bits. q Five KECCAK algorithms: SHA3-512 hash, LakeKEYAK, RiverKEYAK, KetjeSR

and KetjeJR authenticated ciphers. q Compatible with ARM NEON Instruction set

8

Page 9: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Design Principles

Portability §  Easy integration into different processor architectures §  RISC-like instruction format (2 input,1 output operand) §  No non-standard architectural features

Flexibility

§  Support for multiple symmetric cryptographic applications §  Hashing §  MACing §  Stream Ciphers §  AEAD §  PRNG

Simplicity

§  Small set of instructions §  Low hardware Overhead §  Simple operations like XOR, AND, NOT and rotations §  Short critical path 9

Page 10: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Design Principles

Performance

§  Optimize Computation intensive part – KECCAK-{f,p} primitives

§  Partition dataflow graph into SIMD-like instruction patterns §  2x128 bit input and 128 bit output

§  Minimize the schedule length of the graph §  Optimize instruction shapes and functionality for

§  High ILP (instruction-level parallelism) §  Minimum register spills §  Minimum MOV/VEXT for register rearrangements §  In-place round computation

10

Page 11: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Mapping KECCAK to Instructions

L00 L10

L01

L20 L30 L40

L11 L21 L31 L41

L02 L12 L22 L32 L42

L03 L13 L23 L33 L43

L04 L14 L24 L34 L44

C0 C1 C2 C3 C4

D0 D1 D2 D3 D4

XOR XOR XOR XOR XORXOR

ROL1

ROL1

ROL1

ROL1

ROL1

XOR XOR XOR XOR XOR

Column parity

θ-effect

x

y

x=0 x=1 x=2 x=3 x=4

y=4

y=3

y=2

y=1

y=0

x

z sliceplane

lane

State

XOR

ROL1

d0/s0 d1/s1

d2/s2

rl1x.u64 d2, d0, d1rl1x.u32 s2, s0, s1rl1x.u16 s2, s0, s1rl1x.u8 s2, s0, s1

Data dependencies of θ-effect Instruction rl1x

NEON Register Naming Convention Qi = Quad word (128 bit) Di = Double word (64 bit) Si = Single word (32 bit)

NEON Instruction specifier VADD.I32 q1, q2, q3 specifies q1, q2 and q3 have 4x32-bit integer

11

Page 12: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

KECCAK Custom Instructions Instruction Step Description Target Primitive Syntax

rl1x theta   Rotate Left 1 and XOR 1600, 800, 400,

200

rl1x.u64 d2, d0, d1 rl1x.u32 s2, s0, s1 rl1x.u16 s2, s0, s1 rl1x.u8 s2, s0, s1

kxorr64 theta,  rho  &  pi XOR Rotate & Assign 1600 kxorr64 d2, d0, d1, #i

xorr theta,  rho  &  pi   XOR Rotate & Assign 800, 400, 200

xorr.u32 s2, s0, s1, #i xorr.u16 s2, s0, s1, #i xorr.u8 s2, s0, s1, #i

chi1 chi chi step 1600, 800, 400,

200 chi1.u32 q2, q0, q1 chi1.u64 q2, q0, q1

chi2 chi       chi step (last lane) 1600 chi2.u64 d4, q0, q1

chi3 chi   chi step (last lane) 800, 400, 200 chi3.u32 s4, d0, d1

12

Page 13: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Implementation & Validation q Reference Implementations

§  KeccakCodePackge : http://keccak.noekeon.org/files.html q  Hand Optimized Custom Instruction implementations

§  KECCAK-f,p {1600,800,400, 200} q Cross compiled GCC with KECCAK Instruction support

q Timing simple CPU model in GEM5

q  Instructions in GEM5’s ISA description language q Simulation

§  Single core ARM CPU @ 1 GHz §  32KB L1 I and D cache §  L2 cache of 2MB

13

Compiler

CP

Assembler

Linker

CryptoApplications

GNU  CompilerFramework

GEM5ArchitecturalSimulator

ARM  Native  Binaries

SHA3LakeKeyak RiverKeyak

KetjeJR KetjeSR

GEM5  Front  End

Atomic  Simple

Timing  Simple

InOrder

O3

CPU  Model ISA

TLB

Faults

Interrupts

Decoder

KECCAK  Instructions

KECCAK  Instructions

Page 14: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Results: Performance

14

0  

50  

100  

150  

200  

250  

300  

350  

SHA3(c=1024)         Lakekeyak  (E)       Lakekeyak  (D)       Riverkeyak  (E)     Riverkeyak  (D)     KetjeSR  (E)           KetjeSR  (D)           KetjeJR  (E)           KetjeJR  (D)          

INSTRU

CTIONS/BY

TE  

OpHmized  C   32-­‐bit  ASM   NEON   This  work(CI)  

KECCAK-­‐f[1600]  

Hash  

KECCAK-­‐f[800,nr]  

KECCAK-­‐f[400,nr]  

KECCAK-­‐p[200,nr]  

AEAD  (EncrypHon+MAC)   Lightweight  AEAD  

Performance gain between 1.4 - 2.6x over hand

optimized assembly on ARMv7

Performance in instructions/byte for various KECCAK modes

Page 15: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Results: Hardware Cost

15

1869   1221   894   242   122  

0%   10%   20%   30%   40%   50%   60%   70%   80%   90%   100%  

1  

Gate  equivalent  es3mates  with  UMC  90nm  (4658  GE)    

rl1x   kxorr64   xorr   chi1   chi2   chi3  

[CATEGORY  NAME]  0.1%  

Overhead  

Cortex-­‐A9  Core  100%  

Keccak  InstrucHons   Cortex-­‐A9  Core  

Page 16: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Conclusion

q  Analysis of the instruction set design space for KECCAK primitives q  Six custom instructions based on NEON instruction set in ARMv7

q  Five different KECCAK applications: SHA3, LakeKEYAK, RiverKEYAK, KetjeSR and KetjeJR

q  Performance gain between 1.4 - 2.6x over hand optimized assembly on ARMv7 at a hardware overhead of just 4658 GEs.

q  Portability Aspects, Intel AVX, Generic 64/32 bit architectures, in the paper….

16

Page 17: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Questions!

Thank you ….

17

Page 18: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Appendix

18

Page 19: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

KECCAK (SHA-3)

Instance Definition SHA3-224(M) Keccak[448](M||01, 224) SHA3-256(M) Keccak[512](M||01, 256) SHA3-384(M) Keccak[768](M||01, 384) SHA3-512(M) Keccak[1024](M||01, 512) SHAKE128(M, d) Keccak[256](M||1111, d) SHAKE256(M, d) Keccak[512](M||1111, d) M

essa

ge M

(Arb

itrra

ry le

ngth

)

Has

h Va

lue

(Fix

ed L

engt

h)

H

19

Page 20: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Keccak

20 Source: http://keccak.noekeon.org/

Page 21: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

KECCAK-f Permutation (2)

θ  step ρ  step π  step χ  step

L00 ⊕

KRound

ι  step  

Source: http://keccak.noekeon.org/

R=  θ  o  ρ  o    π  o    χ  o    ι

21

Page 22: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Proposed Instruction Set Extensions

D0 D1 D2 D3 D4

B0 B1 B2 B3 B4

XOR XOR XOR XOR XOR

ρ   ρ   ρ   ρ   ρ  

P0 P1 P2 P3 P4

χ   χ   χ   χ   χ  

L00 L10 L20 L30 L40

L00 L11 L22 L33 L44

θ  &π    step

ρ  step

χ  step

Intermediate  value

Intermediate  value

XOR

d0/s0 d1/s1

d2/s2

kxorr64 d2, d0, d1, #ixorr.u32 s2, s0, s1, #ixorr.u16 s2, s0, s1, #ixorr.u8 s2, s0, s1, #i

ROL(i)

(b)

Data dependencies of θ,ρ  ,π  ,χ,  ι  (one plane)

Instruction kxorr64/xorr

22

Page 23: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Proposed Instruction Set Extensions

chi1.u32 q2, q0, q1

q2

NOT NOT

AND AND

XOR XOR

NOT NOT

AND AND

XOR XOR

q0 q1 q0

chi1.u64 q2, q0, q1

q2

q1

NOT NOT

AND AND

XOR XOR

q0 q1

chi2.u64 d4, q0, q1

AND

XOR

NOT

d4

d0 d1

chi3.u32 s4, d0, d1

AND

XOR

NOT

s4

(a)

(b) (c)

Data dependencies X step (one plane)

Instructions chi1,chi2 and chi3 http://keccak.noekeon.org/ 23

Page 24: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Portability Aspects

Instruction Intel AVX 64 bit Arch 32 bit Arch

rl1x ü ü* ü*

kxorr64 ü ü X

xorr ü ü ü

chi1 ü X X

chi2 ü X X

chi3 ü ü X

Table 2: Feasibility of the proposed instructions on different platforms

•  Based on instruction format, register width and the instruction encoding width.

* Not all variants supported 24

Page 25: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Results: Hardware Cost

Instruction µμ2 Gate equivalent Transistor Equivalent

rl1x 1238 310 1238 kxorr64 7474 1869 7474 xorr 4884 1221 4884

chi1 3576 894 3576 chi2 966 242 966 chi3 486 122 486 Total 18624 4658 18624

Table 1: Area, GE and Transistor count estimates for the proposed custom instructions

Typical Cortex-A9 CPU 3.8 Million Gates KECCAK Instructions have 0.1 % overhead

25

Page 26: SIMD Instruction Set Extensions for KECCAK with ... · Analysis of the instruction set design space for KECCAK primitives ! Six custom instructions based on NEON instruction set in

Motivation

26

Cryptographic Instruction-Set

Processor Architecture

Symmetric Cryptography Applications

Universal Sponge

Construction

Intel AES-NI, SHA-1,2, Carry-less multiplication extensions ARMv8 Crypto Instructions

AARM (Soft IP) RISC-V (Open Architecture from UC, Berkeley)

HHash, MAC, Encryption/Decryption, AEAD, PRNG

Hash: SHA-3 (Winner of NIST SHA-3 Competition) AEAD: LakeKeyak, Ketje AEAD PRNG, Stream Ciphers etc.…