Hardware Implementations of AES
ECRYPT II AES day
October 18th, Bruges, Belgium
Stefan Mangard
Infineon Technologies, Munich, Germany
Outline
Requirements and Motivation
AES Components
AES Architectures
Physical Attacks
Summary
17.10.2012 Page 2 Copyright © Infineon Technologies 2012. All rights reserved.
PART I
Requirements and Motivation
17.10.2012 Page 3 Copyright © Infineon Technologies 2012. All rights reserved.
17.10.2012 Page 4 Copyright © Infineon Technologies 2012. All rights reserved.
Why Implement AES in Hardware?
Why AES?
We celebrating the 10th anniversary of AES. However, triple-DES is still around the adaption took really long for some
applications …
Why Hardware?
AES can be implemented efficiently in software on all processors
Only in case of very specific requirements hardware implementations are necessary
Classical Implementation Requirements and Optimization Goals
Page 5
Throughput
Power/Energy
Maintenance,
Flexibility
Area/Memory
Design
Reliability Security
17.10.2012 Copyright © Infineon Technologies 2012. All rights reserved.
Scenarios for AES Hardware Implementations
High Throughput
There is a processor, but the processor is not fast enough (e.g. servers, disk encryption)
Low Area/Power
There is no processor that could be used for AES because it would need too much area or power (e.g. RFIDs)
Low Energy
There is a processor, but given the number of cryptographic operations that need to be performed, the battery lifetime would be too short when encrypting with the processor (e.g. sensor nodes)
Security
There is a processor, but the processor and the system is not secure enough to implement a cryptographic algorithm (e.g. embedded processors)
Page 6 17.10.2012 Copyright © Infineon Technologies 2012. All rights reserved.
Implementation Requirements of AES in Terms of Security
Implementations of AES in any case need to protect the
Confidentiality of the key
Confidentiality of all intermediate values
Depending on the application also the following properties might be required:
Integrity of all intermediate values
Integrity of the key
Confidentiality and integrity of the plaintext
Integrity of the ciphertext
Page 7 17.10.2012 Copyright © Infineon Technologies 2012. All rights reserved.
Summary of Security Requirements
In practice, implementing AES hardware means building a module whose internal datapath and whose input/output interface need to be suited to handle confidential data and to protect the integrity
This makes AES hardware significantly different from functional units like a USB interface
Page 8
AES Module
17.10.2012 Copyright © Infineon Technologies 2012. All rights reserved.
Threats of AES Implementations
Logical Attacks (done via the communication interface)
Buffer overflows
Code injection
Trojans
Debug and test interfaces
…
Physical Attacks (require physical access to the device)
Power analysis attacks
Fault attacks
Probing/Forcing
…
Page 9 17.10.2012 Copyright © Infineon Technologies 2012. All rights reserved.
The Two Main Threat Scenarios
Secure Environment
Logical attacks
Examples: The classical Internet communication scenario, where the attacker does not have physical access to the device
Non-Secure Environment
Logical and physical attacks
Examples: all kinds of embedded devices, USB sticks, smart cards, RFID tags, …
In those scenarios, where there the strongest limitations of
resources (power, energy, and area), typically logical and physical attacks need to be considered
Page 10 17.10.2012 Copyright © Infineon Technologies 2012. All rights reserved.
Worst-Case Example of How Not to Integrate an AES Hardware Modules
USB stick with hardware-based AES-256 encryption to protect the content of the stick
There is a password-based authentication, which is done on the PC
The result of the password check leads to a 32 byte value which is independent of the password!
Whenever this byte sequence is sent to the USB stick as a result of the authentication procedure, the stick grants full access (all you need is a debugger on the PC …)
Page 11 17.10.2012 Copyright © Infineon Technologies 2012. All rights reserved.
PART II
Components
17.10.2012 Page 12 Copyright © Infineon Technologies 2012. All rights reserved.
Preliminaries
AES supports three key length: AES-128, AES-192, AES-256
AES is a round-based block cipher
One AES round consists of four transformations
AddRoundKey, ShiftRows, SubBytes, MixColumns
The Key is expanded and each round is provided with a 128 bit round key
The round function is independent of the key length
The key expansion can be inverted easily
17.10.2012 Page 13 Copyright © Infineon Technologies 2012. All rights reserved.
Overview
17.10.2012 Page 14 Copyright © Infineon Technologies 2012. All rights reserved.
AES
Key Expansion
AES
Data Path
Plaintext Key
Expanded Key Ciphertext
Round keys
Initial Remarks
Decryption can be done in two ways:
Inverse of all operations in reversed order
Inverse of all operations in same order as in encryption plus an extra InvMixColumns Transformation
In hardware, inverting the sequence of the transformations is
usually cheaper than implementing an extra InvMixColumns
The round function is easy to compute – hence, in hardware pre-computation of keys does usually not make sense
An implementation that allows immediate switching between encryption and decryption requires to store two keys (the actual key and the expanded one for decryption)
17.10.2012 Page 15 Copyright © Infineon Technologies 2012. All rights reserved.
Overview
17.10.2012 Page 16 Copyright © Infineon Technologies 2012. All rights reserved.
AES
Key Expansion
AES
Data Path
Plaintext Key
Expanded Key Ciphertext
Round keys
DEC
ENC
AES Data Path - The Round Function
17.10.2012 Page 17 Copyright © Infineon Technologies 2012. All rights reserved.
MC
SB
SB
SB
SB
MC
SB
SB
SB
SB
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
MC
MC
SR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
SB
SB
SB
SB
SB
SB
SB
SB
MC
SB
SB
SB
SB
MC
SB
SB
SB
SB
MC
MC
SR SB
SB
SB
SB
SB
SB
SB
SB
Pla
inte
xt
Initial Round Round 1 Round 1
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AES Data Path – Sbox
The SubBytes operation consists of 16 independent and identical 8-bit Sboxes
There are two options to implement such an Sbox
Lookup table: The 256 8-bit output values are stored in the hardware (not efficient for ASICs; efficient on FPGAs, where BRAMs are available)
“Calculation” of the Sbox: The Sbox corresponds to an inversion in GF(28) followed by an affine transformation; performing this computation is the standard approach for ASIC designs
17.10.2012 Page 18 Copyright © Infineon Technologies 2012. All rights reserved.
Sbox Input Byte Output Byte
AES Data Path – Sbox
17.10.2012 Page 19 Copyright © Infineon Technologies 2012. All rights reserved.
[WOL02]
AES Data Path – Sbox
The affine transformation and its inverse essentially corresponds to 28 XORs
There have been numerous proposals on how to efficiently implement the inversion by using different bases for the decomposition into operations of GF(28)
The most compact Sbox implementation was proposed by Canright in [C05] using normal bases:
It requires about 800 GE for a circuit implementing the Sbox and the inverse Sbox
In detail: 94 XORs, 34 NANDs, 6 NORs, 2 inverters, 16 MUX
17.10.2012 Page 20 Copyright © Infineon Technologies 2012. All rights reserved.
AES Data Path – MixColumns
MixColumns maps four input bytes (one column) to four output bytes (one column)
Each column is considered as a polynomial with coefficients in GF(28)
The operation is defined as follows:
17.10.2012 Page 21 Copyright © Infineon Technologies 2012. All rights reserved.
[NIST01]
AES Data Path – MixColumns
In hardware, calculating one byte of the MixColumns output can be done as shown on the left
In principle it possible to re-use this hardware for each byte; However, there is significant control overhead usually four parallel units are used (re-using common expressions)
In summary, MixColumns simply corresponds to about 200 XORs
17.10.2012 Page 22 Copyright © Infineon Technologies 2012. All rights reserved. [WOL01]
AES Key Expansion
The key expansion does not require any other building blocks than the data path
The key expansion essentially requires four Sbox computations and some XORs for each key expansion step
All the complexity for handling different key sizes needs to be done in the key expansion unit (remark: AES-192 is not nice to implement)
17.10.2012 Page 23 Copyright © Infineon Technologies 2012. All rights reserved.
PART III
Architectures
17.10.2012 Page 24 Copyright © Infineon Technologies 2012. All rights reserved.
Summary of What is Needed
Storage:
Datapath: 128 bit
Key Unit: 128 bit up to 512 bit
(512 bit are needed in implementations of AES-256 that allow immediate switching between encryption and decryption)
Computational operations that need to be done per round
20 Sbox operations
4 MixColumns operations
XOR operations for key addition, key expansion
Multiplexing for Shiftrows and data selection
17.10.2012 Page 25 Copyright © Infineon Technologies 2012. All rights reserved.
The Four Options
SMALL (8 bit architecture)
1 Sbox, 1 MixColumns 20 cycles per round
MEDIUM (32 bit architecture)
4 sboxes, 1 MixColumns -> 5 cycles per round
LARGE (128 bit architecture)
20 boxes, 4 MixColumns -> 1 cycle per round
XLARGE (unrolled 128 bit architecture)
200 boxes, 40 MixColumns -> 1/10 per round
17.10.2012 Page 26 Copyright © Infineon Technologies 2012. All rights reserved.
AES Data Path - The Round Function
17.10.2012 Page 27 Copyright © Infineon Technologies 2012. All rights reserved.
MC
SB
SB
SB
SB
MC
SB
SB
SB
SB
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
MC
MC
SR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
SB
SB
SB
SB
SB
SB
SB
SB
MC
SB
SB
SB
SB
MC
SB
SB
SB
SB
MC
MC
SR SB
SB
SB
SB
SB
SB
SB
SB
Pla
inte
xt
Initial Round Round 1 Round 1
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
AR
SMALL (8 bit Datapath)
In each cycle, there is an Sbox Operation
The Sbox lookups for the key or done in parallel to MixColumns
Two options for the key expansion
32 bit key calculation in the cycles 17, 18, 19, 20
8 bit key calculation in the cycles 17, 18, 19, 20, 21, ….
Shiftrows is either done in an extra cycle of by multiplexing
Size: 2500 – 5000 GE (depending on feature set)
17.10.2012 Page 28 Copyright © Infineon Technologies 2012. All rights reserved.
MC MC
AR AR
MC MC
SB SB 1 2
AR AR
SB SB 3 4
AR AR
SB SB 5 6
AR AR
SB SB 7 8
AR AR
SB SB 9 10
AR AR
SB SB 11 12
AR AR
SB SB 13 14
AR AR
SB SB 15 16
AR AR
SB SB 17 18
AR AR
SB SB 19 20
MEDIUM (32 bit Datapath)
In each cycle, a complete column of the state is processed
Shiftrows is either done in cycle 5 or by multiplexing
The key expansion is done 128 bit parallel
in clock cycle 5
Size: 6.000 – 10.000 GE
(depending on feature set)
17.10.2012 Page 29 Copyright © Infineon Technologies 2012. All rights reserved.
AR
AR
AR
AR
MC
SB
SB
SB
SB
1
AR
AR
AR
AR
MC
SB
SB
SB
SB
2
AR
AR
AR
AR
MC
SB
SB
SB
SB
3
AR
AR
AR
AR
MC
SB
SB
SB
SB
4
SB
SB
SB
SB
5
LARGE (128 bit Datapath)
In each cycle, a complete round of AES is computed
No multiplexing for Shiftrows
The key expansion is done 128 bit parallel
Size: 20.000 – 35.000 GE
(depending on feature set)
17.10.2012 Page 30 Copyright © Infineon Technologies 2012. All rights reserved.
AR AR AR AR
MC
SB SB SB SB
AR AR AR AR
MC
SB SB SB SB
AR AR AR AR
MC
SB SB SB SB
AR AR AR AR
MC
SB SB SB SB
SB SB SB SB
XLARGE (Unrolled 128 bit Datapath)
In each cycle, one AES output is computed
Pipelined processing is done that takes one plaintext per clock cycle and returns one ciphertext per clock cycle
Cannot be used with CBC or other modes that require the ciphertext of the previous block as input for the current block
Size: at least 200.000 GE
17.10.2012 Page 31 Copyright © Infineon Technologies 2012. All rights reserved.
Summary
In DES, there was essentially just one hardware implementation that made sense
AES is more flexible and allows three main architectures (SMALL, MEDIUM, LARGE)
Throughput, power, energy strongly depend on the used technology and on the interfaces
Clocking a SMALL architecture on an RFID tag in the range of 200 kHz leads to 1.000 AES-128 encryptions/sec
Clocking a LARGE architecture on a high speed chip with 1 GHz leads to 100.000.000 AES-128 encryptions/sec
17.10.2012 Page 32 Copyright © Infineon Technologies 2012. All rights reserved.
PART IV
Physical Attacks
17.10.2012 Page 33 Copyright © Infineon Technologies 2012. All rights reserved.
Power Analysis and EM Attacks
How many power traces does the best power analysis attack on AES need?
1 17.10.2012 Page 34 Copyright © Infineon Technologies 2012. All rights reserved.
Power Analysis Attacks on AES
Single trace or average trace
SMALL implementations are particular vulnerable because they leak information about many intermediate results separately
in the worst case, no averaging is necessary
State-of-the-art method to exploit the leakage: algebraic side-channel attacks
Differential power analysis attacks
Attacks on AES work nicely with all kinds of distinguishers
17.10.2012 Page 35 Copyright © Infineon Technologies 2012. All rights reserved.
Power Analysis Trends
Attack Strategies
Profiled attacks are an established tool
Exploitation more and more focuses on multiple points and their relationship (higher-order attacks)
Almost any statistical tool that can be used to measure dependencies between random variables has meanwhile been applied to power analysis
Measurement Setup
Measurements of the power consumption is often done via the electromagnetic field
Small coils allow local attacks on the chip
Basic DPA attack can be conducted with simple and cheap USB oscilloscopes
Storage and processing power of modern PCs and oscilloscopes allows to do attacks with more and more traces
17.10.2012 Page 36 Copyright © Infineon Technologies 2012. All rights reserved.
Algorithmic Countermeasures for AES
Masking
Numerous publications on already since many years on how to mask the Sbox
The problem of how to resolve all the implementation issues (glitches, data-dependent timings, …) are left to the designer
Threshold Implementations
There are meanwhile proposals for threshold implementations that resolve the implementation requirement of glitches
Open Issue
Higher-order attacks: hardware implementations process all shares in parallel or sequentially; Neither a masked nor a threshold implementation do provide sufficient protection given current setups and future trends
17.10.2012 Page 37 Copyright © Infineon Technologies 2012. All rights reserved.
Fault Attacks
How many fault inductions on AES does the best fault attack need?
1 17.10.2012 Page 38 Copyright © Infineon Technologies 2012. All rights reserved.
Fault Attacks on AES
Many papers appeared during the last years and the topic is well researched meanwhile
There are fault attacks on all different key sizes, on the datapath and on the key expansion path
Example results
One pair of (C,C’) and P break AES-128
Two pairs of (C, C’) break AES-128
There are efficient attacks on the AES middle rounds
There are efficient attacks even, if up to 12 byte of the state are changed by the attack
(P … plaintext, C … ciphertext, C’ … faulty ciphertext)
17.10.2012 Page 39 Copyright © Infineon Technologies 2012. All rights reserved.
Fault Attack Trends
Attack Strategies
For AES, there is not much space for practical improvement of the attack any more
However, also the system around the AES implementation is important and active field of research
Attack Setup
Lasers are the most effective method to produce controlled and localized faults in an IC
Attack setups can contain a laser to perform attacks from the front- as well as from the backside of the chip
Setups being able to induce multiple faults are becoming more and more prominent
17.10.2012 Page 40 Copyright © Infineon Technologies 2012. All rights reserved.
Algorithmic Countermeasures for AES
General Countermeasures
Sensor-based approaches: The goal is to detect specific fault induction vehicles (temperature sensor, light sensor, voltage sensor, …)
Error-detection based approaches: The goal is to detect the error that is the consequence of the fault induction
AES-Specific Countermeasures
Most publications use duplication (temporal or spatial)
Few publications on using parities in Sbox not sufficient against fault attacks
Open Issue
Strong algorithmic redundancy measures for AES
Multiple fault attacks
17.10.2012 Page 41 Copyright © Infineon Technologies 2012. All rights reserved.
Probing Attacks
How many probing needles does the best probing attack on AES need?
1 17.10.2012 Page 42 Copyright © Infineon Technologies 2012. All rights reserved.
Probing Attacks on AES
There are only few papers on probing attacks
Probing attacks are significantly more expensive than fault or power analysis attacks
SMALL implementations are particular vulnerable because they leak information about many intermediate results separately
E.g. Placing a needle on a wire of an Sbox provides all Sbox outputs during each encryption run …
Countermeasures include masking, but in the end some physical protection is necessary in order to prevent probing attacks on AES
17.10.2012 Page 43 Copyright © Infineon Technologies 2012. All rights reserved.
PART V
Summary
17.10.2012 Page 44 Copyright © Infineon Technologies 2012. All rights reserved.
Summary
For the components of AES, there exist standard solutions
There are also essentially three standard architectures
In case AES is operated in a secure environment, building AES means taking the standard components, selecting one of the architectures and optimizing the design according the concrete design needs standard design task
In case AES is NOT operated in a secure environment, doing an AES implementation is very challenging
After 10 years of AES, there is no publication on a secure
design that addresses all the threat scenarios
17.10.2012 Page 45 Copyright © Infineon Technologies 2012. All rights reserved.
References
[C05] David Canright: A Very Compact S-Box for AES. CHES 2005
[WOL01] Johannes Wolkerstorfer: An ASIC Implementation of the AES MixColumn-operation
[NIST01] National Institute of Standards and Technology (NIST): FIPS-197: Advanced Encryption Standard, 2001
[WOL02] Johannes Wolkerstorfer, Elisabeth Oswald, Mario Lamberger: An ASIC Implementation of the AES SBoxes. CT-RSA 2002
17.10.2012 Page 47 Copyright © Infineon Technologies 2012. All rights reserved.