institutionen för systemteknik -...

Institutionen för systemteknikDepartment of Electrical Engineering

Examensarbete

Power Analysis of the Advanced Encryption StandardAttacks and Countermeasures for 8-bit Microcontrollers

Examensarbete utfört i Informationskodningvid Tekniska högskolan vid Linköpings universitet

av

Mattias Fransson

LiTH-ISY-EX--15/4907--SE

Linköping 2015

Department of Electrical Engineering Linköpings tekniska högskolaLinköpings universitet Linköpings universitetSE-581 83 Linköping, Sweden 581 83 Linköping

Power Analysis of the Advanced Encryption StandardAttacks and Countermeasures for 8-bit Microcontrollers

Examensarbete utfört i Informationskodningvid Tekniska högskolan vid Linköpings universitet

av

Mattias Fransson


Handledare: Jonathan Jogenforsisy, Linköpings universitet

Christian VestlundSectra Communications AB

Examinator: Jan-Åke Larssonisy, Linköpings universitet

Linköping, 6 november 2015

Avdelning, InstitutionDivision, Department

Information CodingDepartment of Electrical EngineeringSE-581 83 Linköping

DatumDate

2015-11-06

SpråkLanguage

Svenska/Swedish

Engelska/English

RapporttypReport category

Licentiatavhandling

Examensarbete

C-uppsats

D-uppsats

Övrig rapport

URL för elektronisk version

http://urn.kb.se/resolve?urn:nbn:se:liu:diva-122718

ISBN

—

ISRN


Serietitel och serienummerTitle of series, numbering

ISSN

—

TitelTitle

Effektanalys av Advanced Encryption Standard

Power Analysis of the Advanced Encryption Standard

FörfattareAuthor

Mattias Fransson

SammanfattningAbstract

The Advanced Encryption Standard is one of the most common encryption algorithms. Itis highly resistant to mathematical and statistical attacks, however, this security is based onthe assumption that an adversary cannot access the algorithm’s internal state during encryp-tion or decryption. Power analysis is a type of side-channel analysis that exploit informationleakage through the power consumption of physical realisations of cryptographic systems.Power analysis attacks capture intermediate results during aes execution, which combinedwith knowledge of the plaintext or the ciphertext can reveal key material. This thesis studiesand compares simple power analysis, differential power analysis and template attacks us-ing a cheap consumer oscilloscope against aes-128 implemented on an 8-bit microcontroller.Additionally, the shuffling and masking countermeasures are evaluated in terms of securityand performance. The thesis also presents a practical approach to template building anddevice characterisation. The results show that attacking a naive implementation with differ-ential power analysis requires little effort, both in preparation and computation time. Tem-plate attacks require the least amount of measurements but requires significant preparation.Simple power analysis by itself cannot break the key but proves helpful in simplifying theother attacks. It is found that shuffling significantly increases the number of traces requiredto break the key while masking forces the attacker to use higher-order techniques.

NyckelordKeywords power analysis, template attacks, countermeasures, microcontroller, AES

http://urn.kb.se/resolve?urn:nbn:se:liu:diva-122718

Abstract

The Advanced Encryption Standard is one of the most common encryption al-gorithms. It is highly resistant to mathematical and statistical attacks, however,this security is based on the assumption that an adversary cannot access the algo-rithm’s internal state during encryption or decryption. Power analysis is a typeof side-channel analysis that exploit information leakage through the power con-sumption of physical realisations of cryptographic systems. Power analysis at-tacks capture intermediate results during aes execution, which combined withknowledge of the plaintext or the ciphertext can reveal key material. This thesisstudies and compares simple power analysis, differential power analysis and tem-plate attacks using a cheap consumer oscilloscope against aes-128 implementedon an 8-bit microcontroller. Additionally, the shuffling and masking counter-measures are evaluated in terms of security and performance. The thesis alsopresents a practical approach to template building and device characterisation.The results show that attacking a naive implementation with differential poweranalysis requires little effort, both in preparation and computation time. Tem-plate attacks require the least amount of measurements but requires significantpreparation. Simple power analysis by itself cannot break the key but proveshelpful in simplifying the other attacks. It is found that shuffling significantlyincreases the number of traces required to break the key while masking forcesthe attacker to use higher-order techniques.

iii

Acknowledgments

First, I would like to thank the people over at Sectra Communications for givingme the opportunity to work with a topic that is both exciting and highly releventin today’s society. A special thanks goes to my supervisor Christian Vestlund whowas always ready to help and has provided many good thoughts and suggestionsthroughout the thesis work.

I would also like to thank my examiner Jan-Åke Larsson and my supervisor at theuniversity Jonathan Jogenfors for their comments and helpful advice.

Thanks to my friends and family who has supported me throughout the yearsand many, many thanks to my mother for her valuable input and for always beingthere for me.

Finally, a special thought goes to my father who awoke my interest in all thingstechnical—I would not be here if not for him and I so wish he was still with us.

Linköping, November 2015Mattias Fransson

v

Contents

Notation xi

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Sectra AB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Cryptographic Concepts 52.1 A Cryptographic System . . . . . . . . . . . . . . . . . . . . . . . . 52.2 The Objectives of Cryptography . . . . . . . . . . . . . . . . . . . . 62.3 Brief History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3.1 Substitution Ciphers . . . . . . . . . . . . . . . . . . . . . . 72.4 Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.5 Asymmetric Cryptography . . . . . . . . . . . . . . . . . . . . . . . 82.6 Secret Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.6.1 Secret Splitting . . . . . . . . . . . . . . . . . . . . . . . . . 92.7 Cryptanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.7.1 Side-Channel Analysis . . . . . . . . . . . . . . . . . . . . . 10

3 Symmetric-Key Cryptography 133.1 Stream Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 One-Time Pad . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Block Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2.1 Design Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.2 Modes of Operation . . . . . . . . . . . . . . . . . . . . . . . 15

3.3 The Advanced Encryption Standard . . . . . . . . . . . . . . . . . . 153.3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3.2 Algorithm Structure . . . . . . . . . . . . . . . . . . . . . . 163.3.3 Key Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3.4 Decryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

vii

viii Contents

4 Power Consumption 214.1 The Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1.1 Dynamic Power . . . . . . . . . . . . . . . . . . . . . . . . . 214.1.2 Static Power . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.1.3 Short Circuit Power . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 The Microcontroller . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.2.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.2.2 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3 Measuring Power Consumption . . . . . . . . . . . . . . . . . . . . 244.3.1 Shunt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.3.2 Probing the Electromagnetic Field . . . . . . . . . . . . . . 25

4.4 Modelling Power Consumption . . . . . . . . . . . . . . . . . . . . 254.4.1 Binary Models . . . . . . . . . . . . . . . . . . . . . . . . . . 264.4.2 Hamming Weight Model . . . . . . . . . . . . . . . . . . . . 264.4.3 Hamming Distance Model . . . . . . . . . . . . . . . . . . . 26

4.5 Power Consumption Components . . . . . . . . . . . . . . . . . . . 274.5.1 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.6 Signal-to-Noise Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . 284.6.1 Calculating the Signal-to-Noise Ratio . . . . . . . . . . . . . 28

5 Power Analysis 315.1 Power Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.1.1 Number of Sample Points . . . . . . . . . . . . . . . . . . . 325.2 Simple Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.2.1 Attacking RSA . . . . . . . . . . . . . . . . . . . . . . . . . . 325.2.2 Attacking AES . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.3 Differential Power Analysis . . . . . . . . . . . . . . . . . . . . . . . 335.3.1 General Approach . . . . . . . . . . . . . . . . . . . . . . . . 345.3.2 Difference of Means . . . . . . . . . . . . . . . . . . . . . . . 365.3.3 Distance of Means . . . . . . . . . . . . . . . . . . . . . . . . 375.3.4 Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . 375.3.5 Number of Traces . . . . . . . . . . . . . . . . . . . . . . . . 385.3.6 Notes on Key Length . . . . . . . . . . . . . . . . . . . . . . 39

5.4 Template Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.4.1 Multivariate Gaussian Model . . . . . . . . . . . . . . . . . 415.4.2 Template Building Phase . . . . . . . . . . . . . . . . . . . . 415.4.3 Attack Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.4.4 Points of Interest . . . . . . . . . . . . . . . . . . . . . . . . 42

6 Countermeasures 456.1 Hiding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.1.1 Amplitude Hiding . . . . . . . . . . . . . . . . . . . . . . . 456.1.2 Time Dimension Hiding . . . . . . . . . . . . . . . . . . . . 466.1.3 Random Delays . . . . . . . . . . . . . . . . . . . . . . . . . 466.1.4 Shuffling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.1.5 Attacking Shuffling . . . . . . . . . . . . . . . . . . . . . . . 47

Contents ix

6.2 Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.2.1 Masking the S-box . . . . . . . . . . . . . . . . . . . . . . . 486.2.2 Masking Scheme . . . . . . . . . . . . . . . . . . . . . . . . 496.2.3 Masking the Key Schedule . . . . . . . . . . . . . . . . . . . 50

6.3 Higher-Order Differential Power Analysis . . . . . . . . . . . . . . 516.3.1 Second-Order Differential Power Analysis Example . . . . 52

7 Method 557.1 Environment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.1.1 Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567.1.2 Oscilloscope . . . . . . . . . . . . . . . . . . . . . . . . . . . 567.1.3 Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

7.2 AES Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 577.2.1 Naive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587.2.2 Shuffling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587.2.3 Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587.2.4 Random Number Generation . . . . . . . . . . . . . . . . . 59

7.3 Simple Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 597.4 Device Characterisation . . . . . . . . . . . . . . . . . . . . . . . . . 59

7.4.1 Measurement Configuration . . . . . . . . . . . . . . . . . . 597.4.2 Viability of the Hamming Weight Model . . . . . . . . . . . 607.4.3 Signal-to-Noise Ratio . . . . . . . . . . . . . . . . . . . . . . 61

7.5 Differential Power Analysis . . . . . . . . . . . . . . . . . . . . . . . 617.5.1 Attack on Shuffling . . . . . . . . . . . . . . . . . . . . . . . 617.5.2 Second-Order Attack . . . . . . . . . . . . . . . . . . . . . . 61

7.6 Template Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627.6.1 Points of Interest . . . . . . . . . . . . . . . . . . . . . . . . 62

8 Results 638.1 AES Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 638.2 Simple Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 638.3 Device Characterisation . . . . . . . . . . . . . . . . . . . . . . . . . 65

8.3.1 Measurement Configuration . . . . . . . . . . . . . . . . . . 658.3.2 Viability of Hamming Weight Model . . . . . . . . . . . . . 658.3.3 Signal-to-Noise Ratio . . . . . . . . . . . . . . . . . . . . . . 66

8.4 Differential Power Analysis . . . . . . . . . . . . . . . . . . . . . . . 678.4.1 Attack on Shuffling . . . . . . . . . . . . . . . . . . . . . . . 688.4.2 Second-Order Attack . . . . . . . . . . . . . . . . . . . . . . 68

8.5 Template Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

9 Discussion 739.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

9.1.1 Practical Issues . . . . . . . . . . . . . . . . . . . . . . . . . 739.1.2 Device Characterisation . . . . . . . . . . . . . . . . . . . . 749.1.3 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759.1.4 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . 76

x Contents

9.1.5 Number of Traces . . . . . . . . . . . . . . . . . . . . . . . . 769.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769.3 Power Analysis in a Broader Context . . . . . . . . . . . . . . . . . 77

10 Conclusions 7910.1 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

A Mathematical Prerequisites 85A.1 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

A.1.1 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . 86A.1.2 Differentiating Two Distributions . . . . . . . . . . . . . . . 86A.1.3 Fisher z-transformation . . . . . . . . . . . . . . . . . . . . 87A.1.4 Multivariate Normal Distribution . . . . . . . . . . . . . . . 88

Bibliography 89

Notation

GeneralA A matrix.Ai• Row i of matrix A.A•i Column i of matrix A.ai,j Element of the i-th row and the j-th column of matrix A.MM×N The set of all matrices with M rows and N columns.|A| The determinant of A.v A vector.vi The i-th element of vector v.⊕ Exclusive or.

Power AnalysisT Matrix of power consumption values.Z Matrix of hypothetical intermediate values.H Matrix of hypothetical power consumption values.R Matrix with the result of a dpa attack.N Number of power traces and plaintexts.S Number of sample points per power trace.k aes cipher key.w aes expanded key.rn Round key of the nth aes round.ki The ith byte (or sub-key) of k.ρmax Highest achievable correlation coefficient for a correct key guess.

xi

xii Notation

Acronyms

AES Advanced Encryption Standard.ALU arithmetic logic unit.ASIC application-specific integrated circuit.CBC Cipher Block Chaining.CFB Cipher Feedback.CMOS complementary metal-oxide-semiconductor.CPU central processing unit.CTR Counter.DES Data Encryption Standard.DPA differential power analysis.ECB Electronic Codebook.HODPA higher-order differential power analysis.IV initialization vector.LNA low-noise amplifier.LSB least significant bit.NIST the U.S. National Institute of Standards and

Technology.OFB Output Feedback.PCB printed circuit board.PRNG pseudo-random number generator.RAM random-access memory.ROM read-only memory.SCA side-channel analysis.SNR signal-to-noise ratio.SPA simple power analysis.

xiii

xiv Acronyms

Glossary

NMOS N-channel metal-oxide-semiconductor field-effect transistor.

NOP An assembler instruction that stalls the proces-sor for one clock cycle.

PMOS P-channel metal-oxide-semiconductor field-effect transistor.

RSA Widely used public-key cryptosystem inventedby Rivest, Shamir and Adleman.

S-BOX Substitution box. A component of block ci-phers that provide confusion.

SMA A coaxial radio frequency connector.XOR Logic exclusive or. Returns true if and only if

both inputs differ.

xv

xvi Glossary

1Introduction

The following chapter provides an introduction to the thesis’ topic, a motivationto why it is of interest and its goals. The thesis’ outline is given at the end of thischapter.

1.1 Background

The history of cryptography is almost as long as the one of human communicationitself. People have always sought new and more efficient ways to communicatewith each other, from hand gestures to symbols, from speech to writing, fromsmoke signals to telegraphs, from telephones to the internet. At the same time,there has always been a desire to keep these communication channels safe fromour enemies’ eyes and ears. As technology has advanced, so has the need forbetter and more sophisticated encryption. There is a constant struggle betweenthe designers of secure systems and the people who are trying to break them.

Modern society is highly dependent on electronic communication and a widerange of tasks such as money transfer and personal identification are performeddigitally using devices called smart-cards. Smart-cards are embedded with inte-grated circuits of various degrees of complexity. Some examples of smart-cardsare credit cards and sim cards. Due to the private nature of many smart-card ap-plications, the ability to securely transfer data without sacrificing convenience isimportant. Many smart-cards therefore implement microcontrollers programmedto perform data encryption and decryption with a secret key. Today, there isa growing interest in having everything from kitchen appliances to thermostatsconnected to the internet in a so called Internet of Things. These systems provideconvenience and automation to end-users but at the same time they introduce

1

2 1 Introduction

new avenues of attack for malicious parties. The security of any cryptographicsystem hinges upon keeping the key secret. Often this key is fixed and shippedwith the device. The goal of an attack is to find the key, at which point the deviceis compromised.

Classically, the study of breaking cryptographic systems involve trying to findmathematical weaknesses in the cryptographic algorithm or to detect usage pat-terns that may reveal sensitive information. Side-channel analysis (sca) is a sep-arate class of cryptographic analysis that provides insight into the implementa-tion of an algorithm by studying the physical characteristics of the system it isrunning on. Most algorithms are not designed with this in mind and provides lit-tle resistance against these attacks. One of the more potent types of side-channelanalysis is power analysis. A power analysis attack reveals the secret key by ex-ploiting variances in the power consumption of the cryptographic device.

1.2 Sectra AB

Sectra is a Swedish company founded in the late 1970s by researchers at theLinköping Institute of Technology. Today it is a multinational company withoffices in twelve countries. Sectra focuses on two specific areas; medical sys-tems and secure communication. The secure communication department focuses,among other things, on providing protection against eavesdropping to regularphone calls.

1.3 Purpose

The thesis’ purpose is to study the strength and applicability of power analy-sis attacks against the Advanced Encryption Standard (aes) implemented on acommon 8-bit microcontroller. The goal is to provide a reasonably realistic at-tack setting using cheap and readily available equipment. Additionally, differentoptions for software-based countermeasures are considered and their impact onperformance and security is analysed.

1.4 Problem Formulation

The following questions constitute the thesis’ problem formulation:

1. What is power analysis and how can it be used to retrieve the aes encryptionkey from an 8-bit software implementation?

(a) What makes aes sensitive to power analysis?

(b) What different methods of power analysis exist and how can they becompared?

2. How can power analysis of aes be prevented?

1.5 Delimitations 3

(a) Are there ways to make power analysis harder by modifying the soft-ware?

(b) What is the performance cost of these countermeasures?

1.5 Delimitations

Power analysis is interesting not only from a software perspective, but also froma hardware point of view. Many cryptographic systems are implemented onapplication-specific integrated circuits (asics) that may be susceptible to side-channel attacks. However, asics typically run at much higher frequencies thanmicrocontrollers, which increase the requirements on the measuring equipment.Attacks and countermeasures on hardware implementations is therefore not cov-ered in detail but much of the theory is still applicable as it is independent of thephysical implementation.

1.6 Thesis Outline

Chapters 2 and 3 present an overview of historical and modern cryptographyending with a full description of aes. Understanding the different transforma-tions and the overall structure of aes is important as they are directly related tothe attacks presented later in the thesis. In chapter 4 the power consumptionof integrated circuits is presented and it is shown why and how it is possible toconnect power measurements to the data processed by a microcontroller. Thischapter also introduces various ways to model the power consumption, which isa prerequisite for the attacks. Chapter 5 presents a selection of different poweranalysis attacks and covers the theory and methodology behind simple and dif-ferential power analysis. Template attacks are also presented, which constitute adetailed profiling of the microcontroller’s power consumption before the actualattack. A number of countermeasures are given in chapter 6 and focuses bothon how to implement them as well as on how to attack them. In chapter 7 themeasurement setup is presented followed by a description of the method usedto test and evaluate the attacks and countermeasures. Chapter 8 lists the resultsand they are further discussed in chapter 9. Conclusions and final thoughts aregiven in chapter 10. Mathematical prerequisites, mainly in statistics, are detailedin appendix A.

2Cryptographic Concepts

Cryptography is a vast field of study focused on providing methods for securingcommunication channels against the threat of so called adversaries. Attackingand trying to find weaknesses in cryptographic systems is called cryptanalysis.This chapter introduces modern cryptographic concepts and definitions and pro-vides descriptions of two of the most famous historical ciphers.

2.1 A Cryptographic System

In cryptographic literature one often refers to two entities, Alice and Bob, whoare trying to communicate with each other. Communication can take place acrossa distance, in time or both. An example of communication across a distance is atelephone call while storage on a hard disk drive is an example of communicationin time. The medium over which the communication takes place, e.g. a wirelessnetwork, is called the channel. A third party called Eve (as in eavesdropper)represents the adversary. She is attempting to listen in on the channel with thegoal of revealing the message Alice is sending to Bob. To foil Eve’s plans Aliceand Bob uses a system to encrypt their messages. This situation is presented infigure 2.1. Alice is transmitting the message p, called the plaintext, to Bob. Sheencrypts the message using an encryption key, e. The encryption function E(m,e)returns a ciphertext, denoted c, which she sends over the channel. Bob then usesa decryption key d and applies a decryption function, D(c,d), to the ciphertext toregain the original message. Formally, a mathematical definition of an encryptionscheme is presented below.

Definition 2.1 (Encryption scheme). Let P be the set of all plaintexts, C be theset of all ciphertexts and K be the set of all keys. These sets are called the plain-

5

6 2 Cryptographic Concepts

Alice Bob

Eve

E Dp c p

e d

Figure 2.1: A typical communication scenario with an adversary.

text space, ciphertext space and key space, respectively. An encryption scheme isdefined as the pair of functions E and D, where E : P ×K→ C and D : C ×K→ Pif for every e ∈ K there is a d ∈ K such that D(E(p,e),d) = p for all p ∈ P .

An encryption scheme is more commonly referred to as a cipher. Note that defi-nition 2.1 includes two keys: one for encryption and one for decryption but theymay very well be the same. This introduces two types of systems: asymmetric-key systems where the keys are different and symmetric-key systems where thesame key is used for encryption and decryption.

2.2 The Objectives of Cryptography

Encryption provides message confidentiality and plays a very important part insecure communications. However, cryptography incorporates many other con-cepts such as making sure that the message received actually is the message thatwas sent. In literature, four main objectives of cryptography are generally pre-sented: confidentiality, integrity, authentication and non-repudiation [1].

• Confidentiality: No unauthorized party should be able to read the contentsof the message.

• Integrity: No alteration to the message should be possible, either from trans-mission errors or from malicious intent, without Bob detecting this.

• Authentication: Bob should be able to verify that Alice is the sender of themessage.

• Non-repudiation: There should be no way for Alice to deny that she sentthe message.

These additional applications are important. Simply encrypting a message doesnot guarantee that the communication is secure. First of all encryption does notprovide any error detection. Maybe Eve can find a way to change the messagein transit so that Bob receives bad information. Another problem is transmissionerrors due to noisy channels. Non-repudiation is similar to, but not the same as,authentication. How does Bob prove to anyone else that a message he received

2.3 Brief History 7

actually came from Alice? He could have sent the message himself. Similarly,Alice should not be able to deny having sent the message. What if she signs acontract and then rejects that she did?

2.3 Brief History

The history of cryptography is long and many examples of attempts at hidingmessages can be found. Generally, the strongest driving force was protectingstate secrets and military strategies. During the world wars coded messages andcryptanalysis played a big role in the final outcome. Examples include the NativeAmerican code talkers and the German Enigma machine [2]. Modern ciphers arevery different from their predecessors but some of the basic building blocks de-rive from the same concepts: substitution and permutation. Substitution refersto exchanging plaintext symbols for others in a manner depending on the key,e.g. by replacing all occurrences of the letter “A” with the letter “Q”. Permuta-tion, or transposition, is based on permuting the symbols in the message ratherthan substituting them. This means that all plaintext symbols are present in theciphertext. An example is rewriting ANALYSIS as NALASYSI.

2.3.1 Substitution Ciphers

A substitution cipher codes a plaintext by substituting every letter in the plain-text by its corresponding letter from a substitution alphabet. The easiest way tovisualise this is by writing out all letters in the plaintext alphabet on top of thesubstitution alphabet. The following example illustrates one of the most famoussubstitution ciphers: the Caesar cipher [1].

Plaintext alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZSubsitution alphabet: CDEFGHIJKLMNOPQRSTUVWXYZAB

If the text “Grumpy wizards make toxic brew” is encrypted with the above alpha-bet the ciphertext becomes:

Plaintext: GRUMPYWIZARDSMAKETOXICBREWCiphertext: ITWORAYKBCTFUOCMGVQZKEDTGY

Note that in this case the spaces have been removed as they do not exist in theplaintext alphabet. Removing spaces and punctuation marks further obfuscatesthe message by hiding the location of word boundaries. The Caesar cipher aboverotates every letter two steps to the right. The key is the letter C. Another wayto express this generally for any key k is by labelling the letters in the plaintextalphabet from 0 to 25. Encryption is then written as E(m,k) = (m + k) (mod 26)and decryption as D(c,k) = (c−k) (mod 26). The Caesar cipher is not particularlysecure as there are only 26 possible keys and it is trivial to test all of them. Insteadof rotating the letters in the plaintext alphabet the substitution alphabet can bechosen as a random permutation of all available letters. In this case there are26! possible keys which makes brute-forcing a lot harder. Another problem with


substitution ciphers is that the same letters are always substituted in the sameway and thus it is possible to use frequency analysis to break the substitution [1].

Other substitution ciphers can be constructed by for instance choosing a word asthe key and then writing the remaining letters in the alphabet in order withoutrepeating any letters. If the key is “HELLO” the substitution alphabet wouldbecome: HELLOABCDFGIJKMNPQRSTUVWXYZ.

The Vigenère Cipher

Some of the problems with the simple substitution ciphers can be solved by usinga repeating key sequence. A keyword is chosen and repeated to match the lengthof the plaintext. Every letter in the plaintext is then substituted by applyingthe Caesar cipher corresponding to the current letter in the key. The followingexample encrypts a message with the keyword “KEY”.

Plaintext: GRUMPYWIZARDSMAKETOXICBREWKey: KEYKEYKEYKEYKEYKEYKEYKEYKECiphertext: QVSWTWGMXKVBCQYUIRYBGMFPOA

This is commonly called the Vigenère cipher [1]. As the example shows, thesame letter may be encrypted into different letters. This makes frequency analy-sis somewhat harder as the adversary must first determine the length of the key.

2.4 Random Numbers

Random numbers occur in many cryptographic applications and they play animportant role in power analysis countermeasures. Generating random numbersis a difficult topic, especially when it comes to computers. There are many sourcesof true randomness found in nature, e.g. radioactive decay. Flipping an evencoin and rolling an unbiased dice are also sources of random numbers. Whilea series of coin tosses may produce good random numbers, it is way too slowfor any practical applications. In practice, a pseudo-random number generator(prng) is used. A prng is a deterministic algorithm that produce a sequence ofseemingly random numbers. If the same input is given, the same output sequenceis returned. It is therefore common to use a source of true randomness as inputcalled the seed. If the seed is truly random and a good prng is used, the outputsequence will be unpredictable. A property of all prngs is that at some pointthe output sequence will repeat, i.e. it has a period. Cryptographically securegenerators should have long periods.

2.5 Asymmetric Cryptography

Modern encryption schemes can be divided into two groups depending on whetherthe same key is used for both encryption and decryption or not. Those that douse the same key are symmetric and those that do not are asymmetric. Anotherterm for asymmetric cryptography is public-key cryptography. One of the major

2.6 Secret Sharing 9

challenges in symmetric cryptography is how to securely share the key with all in-volved parties. The obvious way is to meet up, in person, and determine the keybefore sending the data. This is however not practical. Public-key cryptographyattempts to solve this by splitting a person’s key into two parts: a public key thatis made available to everyone intended for encryption and a secret key intendedfor decryption. Public-key encryption schemes are generally slow compared totheir symmetric counterparts which makes the transmission of large messagesinfeasible. A common use case is to apply symmetric encryption for the actualmessage and public-key encryption to securely distribute the secret key [3].

Asymmetric encryption schemes are based on trapdoor one-way functions. Aone-way function f has the properties that it should be easy to calculate f (x)given x, but hard to calculate x given f (x). A trapdoor one-way function addi-tionally satisfies that it is easy to calculate x given some certain knowledge. Un-fortunately, there is no proof that trapdoor one-way functions exist and no realway of constructing them. There are, however, some functions that are thoughtto be trapdoor one-way functions. While a detailed description of asymmetriccryptography is out of this thesis’ scope it is of interest in the context of poweranalysis. In chapter 5 a short example is given where modular exponentiation isattacked. Modular exponentiation is believed to be a trapdoor one-way functionand is commonly seen in public-key cryptography.

2.6 Secret Sharing

The following section describes a way to share a secret between different partiesso that no single individual is capable of recovering the secret without help fromthe others. This can be likened to a safe that requires multiple keys to open wherethe keys are distributed to different people. Using a unique physical lock foreach key is clumsy and adding or removing keys quickly becomes inconvenient.Consider instead a combination lock. How should this combination be dividedamong the involved parties? Secret sharing refers to a set of methods for splittinga secret into a number of shares. In order to reconstruct the secret multiple (orall) shares must be combined. This topic is closely related to one of the morepopular power analysis countermeasures.

2.6.1 Secret Splitting

Suppose you want to send a message, e.g. the combination to a safe, to Alice andBob. They should only be able to read the message if they combine their knowl-edge. Represent the message m as an integer and generate a random numberr. Give r to Alice and m − r to Bob. The message is reconstructed by adding theshares back together. It is important that all possible values of r are equally likely.However, this is not the case as there are infinitely many integers [1]. To makesure all r are equally likely with the probability 1/N it is chosen as a random inte-ger modulo N . To make sure that m can be recreated N must be larger than allpossible messages. Secret splitting is generalized to n people by generating n− 1


random numbers r1, r2, . . . , rn−1 (modN ) and distributing them as shares. Thefinal share is calculated as rn =m− r1 − r2 − · · · − rn−1 (modN ).

2.7 Cryptanalysis

Attacking cryptographic systems is known as cryptanalysis. The methods em-ployed usually depend on both the amount and the type of information an ad-versary has. A fundamental idea in cryptography is that the adversary knowsthe system, i.e. an encryption scheme must be secure even if all details about itare known except the secret key [4]. This is called Kerckhoffs’s prinicple afterthe Dutch cryptographer Auguste Kerckhoffs. The following list describes thepossible attacks available based on the information available to Eve [3]:

• Ciphertext-only: A set of ciphertexts are available to Eve.

• Known plaintext: Eve has access to a set of plaintext-ciphertext pairs.

• Chosen plaintext: Eve can chose the plaintexts and acquire the correspond-ing ciphertexts through encryption.

• Chosen ciphertext: Eve can chose the ciphertexts and acquire the corre-sponding plaintexts through decryption.

2.7.1 Side-Channel Analysis

Side-channel analysis is another type of cryptanalysis, but instead of employingthe previously mentioned methods and mathematically work your way to thekey an additional source of information is used: the physical dimension. Whilea cryptographic algorithm may be mathematically secure under the assumptionthat an adversary can only determine the in- and output it is not necessarily se-cure if intermediate results can be extracted. As figure 2.2 shows, leakage ofsensitive information can be observed from many different sources. The crypto-graphic system consumes power and the movement of charge carriers gives riseto electromagnetic fields. Another way to infer intermediate values is by care-fully examining variations in the delay between input and output. Paul Kocheris one of the pioneers in side-channel analysis and introduced both timing andpower analysis where he and his co-authors demonstrated attacks on asymmetricand symmetric ciphers [5, 6]. More recently, the acoustic side-channel have beenused to extract rsa encryption keys [7]. Acoustic attacks exploit sound caused bysmall vibrations in electrical components.

One of the major advantages of side-channel attacks is that in many situations(differential power analysis in particular) the required knowledge of the attackedsystem is minimal. Often it is enough to know the algorithm while the device it-self is treated as a black box. One makes a distinction between invasive and non-invasive side-channel analysis. In the first case the system is modified to allow anadversary to record some property, e.g. by adding current sensing circuitry. Con-versely, a non-invasive side-channel attack does not require modification of the

2.7 Cryptanalysis 11

target system. Placing a microphone in the vicinity of the target is non-invasive.

Alice Ep c

e

Timing informationPower dissipation

Electromagnetic radiationetc.e

Eve

Figure 2.2: Intermediate results leak through physical side-channels and canbe exploited to extract the secret key.

A closely related area is fault attacks (or fault injection). By causing faults in theinternal logic of a processor it is possible to make it behave in a way beneficialto an adversary or even produce key material. Some examples of fault injectioninclude introducing variations in the power supply voltage or glitches in the clocksignal causing the processor to skip instructions or misinterpret data [8]. Faultinjection attacks are active as the adversary chooses the input and controls thedevice behaviour. In contrast, side-channel analysis attacks are generally passive.

3Symmetric-Key Cryptography

In this chapter, an overview of symmetric-key cryptography is presented. Sym-metric algorithms are often fast and easy to implement in both hardware andsoftware and are therefore used in a wide range of applications. Stream ciphersrefer to symmetric-key encryption schemes that operate on plaintexts of arbitrarylength. This is in contrast to block ciphers that operate on plaintexts of fixedlength. Block cipher primitives are therefore often used in combination with dif-ferent modes of operation.

3.1 Stream Ciphers

As the name implies stream ciphers operate on streams of plaintexts and keysand produce streams of ciphertexts.

Definition 3.1 (Stream cipher). Let p = p1p2 . . . where pi ∈ P and k = k1k2 . . .where ki ∈ K. A stream cipher is defined as an encryption scheme such thatE(p,k) = c1c2 . . . = c where ci ∈ C and D(c,k) = p1p2 . . . = p.

Often the key is used as a seed to a prng that produces a pseudo-random bitsequence, which in turn is xored with the plaintext to produce the ciphertext.

3.1.1 One-Time Pad

The one-time pad is a stream cipher where P = C = K = 0,1. Encryption anddecryption are defined as the logical exclusive-or (xor):

E(p,k) = p⊕ k = c

D(c,k) = c⊕ k = p

13

14 3 Symmetric-Key Cryptography

The security is based on the key stream being completely random. Every bit inthe key is either 1 or 0 with the probability one half. The one-time pad is specialbecause it provides perfect secrecy [1]. This means that the ciphertext gives noinformation on the plaintext, i.e. every plaintext is equally likely given only theciphertext. There are of course some major drawbacks with the one-time pad.First of all, there must be as many bits in the key stream as there are in theplaintext. Managing the secret key quickly becomes troublesome as the messagelength increases. The second issue is that the key may only be used once, hencethe name the one-time pad. Suppose Eve intercepts two ciphertexts c1 and c2encrypted with the same key k. If she xors the ciphertexts she can effectivelyeliminate the key since c1⊕ c2 =m1⊕k⊕m2⊕k =m1⊕m2. Since the messages areunlikely to be randomm1⊕m2 may provide a lot of information on the individualmessages. For these reasons the one-time pad is rarely used in practice, but manystream ciphers draw inspiration from it.

3.2 Block Ciphers

A block cipher is a symmetric-key encryption scheme that operates on fixed sizedblocks of data. Today’s most popular block ciphers are the Advanced EncryptionStandard and its predecessor the Data Encryption Standard.

Definition 3.2 (Block cipher). A block cipher is defined as an encryption schemewith the plaintext and ciphertext spaces P = C = 0,1m and the key space K =0,1n, where m is the block size and n is the key length.

Essentially, a block cipher can be seen as a substitution cipher but instead ofmapping single symbols entire blocks of symbols are substituted. Ideally, theperfect block cipher would be able to output all possible permutations of C. Thereare 2m! elements in the set of all permutations of 0,1m. In order to be able togenerate every element the key must be log2(2m!) ≈ (m−1.44)2m bits long [3]. Thisnumber is huge and there is no way to use keys of that size in practice. Instead,block cipher designers try to approximate the ideal behaviour.

3.2.1 Design Criteria

There are two important design criteria for block ciphers, namely diffusion andconfusion. Diffusion means that a small change in the plaintext should cause theciphertext to change significantly. This is sometimes referred to as the avalancheeffect. Another way to define diffusion is through the strict avalanche criterion.Diffusion is required to force an adversary to use full block statistics rather thansingle letter statistics.

Definition 3.3 (Strict avalanche criterion). If a single bit in the plaintext isflipped, every bit in the ciphertext should flip with the probability 1

2 .

Confusion refers to the property that every bit in the ciphertext should dependon multiple bits of the key. The goal is to make the relation between ciphertext

3.3 The Advanced Encryption Standard 15

and key as complex as possible and can be accomplished by using non-lineartransformations.

3.2.2 Modes of Operation

By design stream ciphers can handle arbitrarily sized data but block cipher prim-itives are limited to messages with a length equal to the block size. To deal withthis limitation a so called mode of operation is implemented on top of the blockcipher. The mode of operation specifies how to securely reuse the same blockcipher with the same key over multiple blocks of data. While providing con-fidentiality is the primary objective some modes of operation are designed toincorporate message authentication as well. This thesis’ objective is mainly tostudy attacks against the aes primitive, so this section will only briefly coversome of the most common modes of operation. Two modes, Electronic Codebook(ecb) and Cipher Block Chaining (cbc), are presented in the following sectionsto illustrate the impact of choosing a suitable mode. Other examples are CipherFeedback (cfb), Output Feedback (ofb) and Counter (ctr). These other modes areeffectively stream ciphers that use the block cipher as a prng.

Electronic Codebook

The ecb mode of operation constitute the most straightforward method to en-crypt messages of any size. The message is divided into chunks of the same sizeas the block size of the underlying block cipher and then encrypted individually.Decryption is performed in the same way. ecb has a serious weakness in its in-ability to hide data patterns. Identical plaintexts will always encrypt to the sameciphertext, which enables Eve to construct a codebook by observing messagessent between Alice and Bob. Since most messages contain some structure Evemay be able to determine the message’s context or even modify the message. Theecb mode of operation should generally not be used for arbitrary data.

Cipher Block Chaining

cbc is intended to reduce some of the problems encountered in ecb by making thenext ciphertext output depend on the previous one. Similarly to ecb the messageis divided into blocks but before encrypting the block it is xored with the previ-ous ciphertext. This removes the data pattern problem since two blocks with thesame data will result in two different ciphertexts. As there is no previous cipher-text for the first block something called an initialization vector (iv) is supplied.The iv is a random number sent in the clear with the encrypted message. It isimportant that the iv is random, otherwise the adversary can detect when twoidentical messages are encrypted.

3.3 The Advanced Encryption Standard

The Advanced Encryption Standard (sometimes known as as Rijndael after itscreators Vincent Rijmen and Joan Daemen) is an encryption standard ratified bythe U.S. National Institute of Standards and Technology (nist) in 2001 [9]. In


1997 nist announced that they were looking to replace the old Data EncryptionStandard (des) and invited the cryptologic community to take part in the process.In addition to analysing the security aspects of the algorithm, participants wereasked to take the implementation costs in both software and hardware into con-sideration. Fifteen algorithms were evaluated and out of five finalists Rijndaelwas chosen as the winner.

3.3.1 Notation

Some notation must be introduced before describing the algorithm. In aes thesmallest unit is the byte. Without going into details every byte corresponds toan element in a finite field (or Galois field ) denoted as F28 and all arithmetic isperformed in this finite field. A group of four bytes is called a word. A wordx consisting of the bytes a,b,c and d is written as x = a,b,c,d. Let Nb denotethe block size in words, Nr denote the number of rounds and Nk denote the keylength, also in words. Every round, n, is associated with a 16 byte round keydenoted rn. The current progress of the algorithm is stored in an array of 16 bytescalled the state. The state array can be viewed as a 4×4 column-major matrix andit is denoted by S. Finally, whenever the state is updated this is indicated by aprime, e.g. S′ . The chapters on power analysis often refer to something called asub-key. The ith sub-key of k is written as ki and corresponds to the ith byte ofk. A 16-byte key therefore consists of 16 sub-keys.

3.3.2 Algorithm Structure

The aes algorithm is a block cipher working on blocks of 128 bits. Rijndael sup-ports different block sizes but in the aes specification it is fixed to four words,i.e. Nb = 4. Supported key lengths are 128 bits, 192 bits and 256 bits and thenumber of rounds, Nr , are 10, 12 and 14, respectively for each key length. Everyround consists of a number of transformations operating on the state. Figure 3.1presents the algorithm as a block diagram. At the start of the algorithm, the in-put is copied into the state. Let b hold the input bytes b0,b1, . . . ,b15. The state iswritten as:

S =

b0 b4 b8 b12b1 b5 b9 b13b2 b6 b10 b14b3 b7 b11 b15

=

s0,0 s0,1 s0,2 s0,3s1,0 s1,1 s1,2 s1,3s2,0 s2,1 s2,2 s2,3s3,0 s3,1 s3,2 s3,3

AddRoundKey

In AddRoundKey the state is modified by xoring it bytewise with the currentround key rn. How the round keys are derived from the secret key is explainedin section 3.3.3.

S′ =

s0,0 ⊕ rn0 s0,1 ⊕ rn4 s1,2 ⊕ rn8 s1,3 ⊕ rn12s1,0 ⊕ rn1 s1,1 ⊕ rn5 s2,2 ⊕ rn9 s2,3 ⊕ rn13s2,0 ⊕ rn2 s2,1 ⊕ rn6 s3,2 ⊕ rn10 s3,3 ⊕ rn14s3,0 ⊕ rn3 s3,1 ⊕ rn7 s3,2 ⊕ rn11 s3,3 ⊕ rn15


In

AddRoundKey

SubBytes

ShiftRows

MixColumns

AddRoundKey

Key whitening

Nr − 1 rounds

Final round

SubBytes

ShiftRows

AddRoundKey

Out

Figure 3.1: Structure of the aes algorithm. Nr is the number of rounds.

SubBytes

SubBytes substitutes the state by applying a non-linear transformation indepen-dently on every byte. The substitution function S, called the S-box, is constructedby combining two invertible functions, g and h so that S(x) = h(g(x)). The twofunctions are defined as

g :F28 → F28 , x→ x−1

h :F28 → F28 , x→Ax+ b

where h is an affine transformation operating on the bits of x and A and b are

A =

1 0 0 0 1 1 1 11 1 0 0 0 1 1 11 1 1 0 0 0 1 11 1 1 1 0 0 0 11 1 1 1 1 0 0 00 1 1 1 1 1 0 00 0 1 1 1 1 1 00 0 0 1 1 1 1 1

, b =

11000110

.


Zero has no inverse in F28 . This is solved by mapping zero to itself, i.e. g(0) = 0.Finally, the updated state becomes:

S′ =

S(s0,0) S(s0,1) S(s0,2) S(s0,3)S(s1,0) S(s1,1) S(s1,2) S(s1,3)S(s2,0) S(s2,1) S(s2,2) S(s2,3)S(s3,0) S(s3,1) S(s3,2) S(s3,3)

SubBytes is generally efficiently implemented as a lookup table as there are only256 elements in F28 .

ShiftRows

The state is modified by rotating each row a fixed number of bytes to the left. Theith row of S is rotated i bytes, which means that S0• is left unchanged.

S =


⇒ S′ =


MixColumns

MixColumns transforms the state by treating every column in S as a polynomialover F28 . The polynomials are multiplied by the fixed polynomial a(x) = 3x3 +x2 +x+2 (mod x4 +1). This can be seen as a matrix multiplication so that the newstate becomes:

s′0,js′1,js′2,js′3,j

=

2 3 1 11 2 3 11 1 2 33 1 1 2

s0,js1,js2,js3,j

Similarly to the S-box, multiplication by two and three in F28 can be precomputedfor all possible byte values and implemented as lookup tables.

3.3.3 Key Schedule

The key is mixed with the plaintext in the form of so called round keys. A newround key is required for every invocation of AddRoundKey, which means that atotal of Nr + 1 round keys must be generated. The aes key schedule generates astring of Nb × (Nr + 1) words called the expanded key, denoted w. This can bewritten as

w = r0r1 . . .rNr

where ri is the ith round key. Let wi , 0 ≤ i < Nb × (Nr + 1) denote the ith wordof the expanded key. Two additional functions are defined for the key schedule:SubWord and RotWord. These transformations take words as input. SubWordsimply applies the S-box to every byte in the given word while RotWord performsa circular shift one byte to the left. Additionally, a word array of round constants


often called Rcon, denoted here as c, is defined such that ci = 2i−1,0,0,0 fori ≥ 1. As with all arithmetic in the algorithm, 2i−1 is a power of 2 in F28 .

The first Nk words of w are set to the secret key. The rest of the words are calcu-lated so that wi depends on wi−1 and wi−Nk . Algorithm 1 presents the entire keyschedule. In a resource-constrained environment the presented algorithm might

Algorithm 1: Generate expanded key

Input: Secret key k (4×Nk bytes)Output: Expanded key w (Nb × (Nr + 1) words)for i← 0 to Nk − 1 do

wi ← k4i , k4i+1, k4i+2, k4i+3endfor i←Nk to Nb × (Nr + 1)− 1 do

t← wi−1if i mod Nk = 0 then

t← SubWord(RotWord(t))⊕ ci/Nkelse if Nk > 6 and i mod Nk = 4 then

t← SubWord(t)endwi ← wi−Nk ⊕ t

endreturn w

not be ideal as the entire expanded key occupies a lot of memory. Any roundkey can be derived from the previous round key and a common approach is toperform the key schedule on the fly.

3.3.4 Decryption

A straightforward method of decryption is to just invert all the transformationsand perform them in reverse order. With a slight modification to the key sched-ule it is possible to avoid changing the algorithm structure and sufficient toreplace the transformations with their inverses: InvSubBytes, InvShiftRows andInvMixColumns. AddRoundKey is its own inverse and can be reused directly.InvShiftRows is simply ShiftRows with the rotations being to the right instead ofthe left. In InvSubBytes the same operations are performed but with the inverseof the S-box: S−1(x) = g−1(h−1(x)) = g(h−1(x)). Similarly, InvMixColumns uses theinverse of the polynomial a(x), i.e. a−1(x) = 11x3 + 13x2 + 9x+ 14, but is otherwiseunchanged.

The key schedule is modified by applying InvMixColumns to all round keys ex-cept the first and the last ones. The round keys are then applied in the oppositeorder; the first round of decryption uses the last round key of the modified keyschedule.

4Power Consumption

In this chapter the power consumption of a microcontroller is discussed and howit can be used in power analysis. The chapter begins with a quick overview ofan inverter and its power consumption followed by a structural description of amicrocontroller. In a power analysis attack the adversary must model the powerconsumption and this is further discussed in sections 4.4 and 4.5.

4.1 The Inverter

Since the 1970s the semiconductor industry has been dominated by complemen-tary metal-oxide-semiconductor (cmos) technology. Ease of scaling, a high noiseresistance and low static power consumption are some of the reasons why cmos isso popular. A basic understanding of the power characteristics of a cmos devicesis therefore of interest. The power dissipation of a cmos circuit can be dividedinto three parts: the dynamic power consumption, the static power consumptionand the power dissipation caused by short circuits [10].

4.1.1 Dynamic Power

Figure 4.1a shows a typical cmos inverter where Cload is the load capacitanceon the inverter’s output. Dynamic power consumption is due to charging anddischarging Cload . When Vin is high, the pmos transistor is turned off while thenmos transistor is conducting, effectively shorting Vout to ground causing C todischarge. When Vin goes low the opposite occurs and the load capacitance ischarged from zero to Vdd . This means that power is only consumed when theinput goes from high to low. The latter scenario is illustrated in figure 4.1b and

21

22 4 Power Consumption

Vdd

Vin

Cload

Vout

(a) A cmos inverter.

Vdd

Cload

vout(t)

i(t)

(b) Charging of load capacitance whenthe input goes from high to low.

Figure 4.1: Illustration of an inverter’s power consumption.

equation (4.1) derives the energy consumed during the charging of Cload .

E =∫ ∞

0Vdd i(t)dt =

∫ ∞0VddCload

dvoutdt

dt =∫ Vdd

0CloadVdddvout = CloadV

2dd

(4.1)In order to calculate the dynamic power consumption it is necessary to determinehow often the inverter toggles, or rather, how often the output goes from low tohigh.

Definition 4.1 (Switch activity). The switch activity is the probability that theoutput of a node goes from logic zero to logic one during one clock period.

Equation (4.2) presents the dynamic power consumption where α is the switchactivity of the cmos inverter and f is the circuit’s clock frequency.

Pd = αf E = αf CV 2dd (4.2)

A direct consequence of equation (4.2) is that the dynamic power consumption isdependent on the input of the circuit as that is what determines the switch activ-ity. Thus, data variations can be related to variations in the power consumption,which is the focus of power analysis.

4.1.2 Static Power

One of the main benefits of cmos technology is the low static power consumption.Between the triggering flanks, the inverter’s input remains constant and one ofthe transistors will be off and thus, no current can flow from the power supplyto ground. Unfortunately, the transistors are not ideal, and there will always besome leakage current. The static power consumption is independent of the in-put data and generally very small compared to the dynamic power consumption.Therefore, if is of little interest in the context of power analysis.

4.2 The Microcontroller 23

4.1.3 Short Circuit Power

Short circuit power dissipation is caused by the rise and fall times of the digitalinput signal. At some point, both transistors will conduct at the same time andthere will be a direct path between the power supply and ground. The resultingcurrent is called the short circuit current. Similarly to the static case, the shortcircuit power consumption is generally a lot smaller than the dynamic powerconsumption and is not further regarded in this thesis.

4.2 The Microcontroller

A microcontroller is an integrated circuit that contains a processor, some mem-ory and often peripherals such as serial communication buses, data convertersetc. The processing power of a microcontroller is usually significantly lower thanthat of a microprocessor used in a personal computer but they are a lot cheaper,consume less power and produce less heat. This coupled with the previously men-tioned integration of memory and peripherals make microcontrollers suitable forembedded applications.

4.2.1 Structure

Arguably, the most important component in a microcontroller is the central pro-cessing unit (cpu). Its main responsibility lies in fetching instructions from mem-ory, decoding them and finally executing them. The cpu itself consists of a controlunit, an arithmetic logic unit (alu) and a number of registers. The cpu is con-nected with the microcontroller’s main memory and peripherals through a busas depicted in figure 4.2. In an 8-bit microcontroller the bus is eight bits wide.

CPUMemory Peripherals

Figure 4.2: Block structure of a microcontroller.

When reading from (or writing to) memory the eight bus wires are charged toone or discharged to zero depending on the bit that is sent. Generally, bus wiresare long and have high capacitive loads. Thus, the power consumption of trans-ferring data is high and constitute a large part of the overall power consumption.

4.2.2 Operation

While a thorough discussion of the intrinsics of the targeted microcontrollerwould prove helpful in some power analysis attacks, particularly if reverse en-gineering the firmware is of interest, the attacks discussed in this thesis do notdirectly benefit from this knowledge.


4.3 Measuring Power Consumption

There are numerous ways of measuring power. One of the simplest methods isto measure the voltage over a resistor connected in series between the power sup-ply and the microcontroller. Alternatively, a current probe can be used instead.In some situations inserting a resistor is not possible as any modifications wouldmake it apparent that the device has been tampered with. Additionally, some mi-crocontrollers may be powered internally from small batteries making it difficultto access the power wires. Alterations to the device can be avoided by measur-ing the electromagnetic field around the microcontroller. In power analysis theattacker is generally not interested in exact measurements of the power consump-tion but rather values proportional to it. This section shows two methods that canachieve this.

4.3.1 Shunt

The microcontroller’s power consumption can be determined by measuring thevoltage drop over a small resistor, known as a shunt, inserted between the Vdd-pin on the microcontroller and the power supply. This is called high-side sensingand is a technique for sensing currents. Figure 4.3 illustrates this scenario wherethe microcontroller is represented by a generic load with a resistance Rload . Bymeasuring the voltage Vload the power consumption can be calculated. The in-stantaneous power consumption of the microcontroller is given by equation (4.3).

P = RloadI2load (4.3)

The current Iload can be calculated by applying Ohm’s law at the shunt resistor,as specified in equation 4.4.

Iload =VshuntR

=V0 −Vload

R(4.4)

The power consumption is proportional to the square of the current and Vload isalways lower than V0, therefore P is a strictly decreasing function with regardsto Vload . That is, the lower the voltage is at Vload , the higher the power consump-tion is. It is similarly possible to insert a resistor between ground and the mi-

V0

R

Vshunt

Iload

RloadVload

Figure 4.3: High-side sensing with shunt. Vload is the instantaneous voltageover the microcontroller.

4.4 Modelling Power Consumption 25

crocontroller’s Vss-pin, known as low-side sensing. The difference is that a highmeasured voltage would indicate a high power consumption. In a power analysisattack, the value of interest is Vload while P and Iload are never calculated. Thus,Rload does not have to be known.

A problem with high-side sensing is that it introduces a high common-mode volt-age as, generally, Vload is very close to V0. This puts additional demands on themeasurement circuitry. Specifically, the voltage range must be wide enough to in-clude the entire power consumption signal while still maintaining a high enoughresolution to provide acceptable results. Low-side sensing on the other hand re-moves the direct path to ground, which might cause behavioural issues. However,in the case of a microcontroller this should not be a problem. By using a differen-tial probe instead of a single-ended one, both of these issues can be alleviated.

4.3.2 Probing the Electromagnetic Field

Another observable quantity is the electromagnetic radiation emanated by themicrocontroller. The movement of charge carriers generates a magnetic field thatvaries in a data-dependent manner. An H-field probe is a conducting wire (suchas a coaxial cable) with a coil at the end. The probe is placed in the vicinity of thetarget so that the magnetic field induces a voltage through it. This voltage canthen be measured using, for example, an oscilloscope.

Equation (4.5) describes how the magnetic field B is produced by a static electriccurrent I in a path C, where µ0 is the magnetic constant and r is a vector fromthe wire element dl and the point in space where the magnetic field is measured.This is known as the Biot-Savart law [11].

B =µ0I

4π

∫C

dl × r|r|3 (4.5)

Variations in the magnetic field will induce an electromotive force emf (i.e. a volt-age) in the probe according to Faraday’s law of induction as shown in (4.6) whereS is the surface area of the probe loop [12].

emf = − ddt

∫S

B · ds (4.6)

Again, the actual power consumption is never calculated. For the purpose ofpower analysis it is enough to know that the voltage measured in the H-probe isrelated to the microcontroller’s current.

4.4 Modelling Power Consumption

To execute a power analysis attack it is necessary to model the power consump-tion of the device under attack. More specifically, assumptions has to be madethat relate the information leakage to the power consumption. One of the morecommon models is to assume that a device leaks the Hamming weight of the datait manipulates, which is often the case for microcontrollers [13]. While it is possi-


ble to use full-scale circuit simulations it is often not realistic as an attacker rarelyhas enough knowledge about the device’s physical implementation to make sucha model. Even with the aforesaid detailed knowledge the computational complex-ity of circuit simulations (through tools such as spice) would make an attack tootime-consuming.

4.4.1 Binary Models

Binary models work on the assumption that if some predicate is true, the de-vice consumes more power than when the predicate is false. For example if theleast significant bit (lsb) of some byte is set the modelled power consumption isone, otherwise zero. This model may seem crude as a single bit’s contributionto the overall power consumption is fairly small at first glance, but in a 8-bitmicrocontroller every bit correspond to one eighth of the data dependent powerconsumption. However, and this is important to note, the actual contribution isnot important as long as the difference can be detected. This holds true even fordevices with wider words, such as 64-bit processors. The difference is harder todetect, but as long as it is possible the model serves its purpose.

4.4.2 Hamming Weight Model

In the Hamming weight model the power consumption is modelled to be propor-tional (or inversely proportional) to the Hamming weight of a binary number.

Definition 4.2 (Hamming weight). The Hamming weight of an n-bit binarynumber B = bn−1 . . .b1b0 is defined as

HW(B) =n−1∑i=0

bi .

The Hamming weight effectively correspond to the number of ones within thebinary number. The model is motivated by the observation that power is onlyconsumed in a cmos device during switching, i.e. when data changes. A largepart of the total power consumption is due to bus activity like when data is trans-ferred between the microcontroller’s memory and its registers. A basic assump-tion when applying the Hamming weight model is that all data transfers are pre-charged. This means that before the data is put on the bus, the bus wires are setto some predetermined value such as logic zero or one. If they are set to zeroprior to the data transfer the number of transitions will be exactly the same asthe number of ones within the data. Thus, in the Hamming weight model theinstantaneous power consumption of the 8-bit microcontroller is modelled as avalue between zero and eight.

4.4.3 Hamming Distance Model

An extension to the Hamming weight model is to use the Hamming distance toimprove on some shortcomings of the previous model.

4.5 Power Consumption Components 27

Definition 4.3 (Hamming distance). The Hamming distance between two n-bitbinary numbers A = an−1 . . . a0 and B = bn−1 . . .b0 is defined as

HD(A,B) =n−1∑i=0

|ai − bi | = HW(A⊕B).

The Hamming distance effectively measures the number of bits where two binarynumbers differ and is particularly suitable for cmos devices. Consider a genericcmos device where some value is read from a register, processed and then storedin the same register. The number of bits that toggles is equivalent to the Ham-ming distance. In the microcontroller’s case, suppose that the bus wires are notpre-charged to either ones or zeros but to some other, unknown constant. In thiscase the Hamming weight model does not correctly predict the power consump-tion. If it is possible to determine this constant, the Hamming distance modelcan be utilized instead.

4.5 Power Consumption Components

Studying the instantaneous power consumption is of great interest in the contextof power analysis. The power consumption is divided into two components:

• The leakage signal: The part of the power consumption that provides ex-ploitable information.

• The noise signal: Everything that cannot be exploited.

The leakage signal is denoted PS while the noise signal is written as PN . The totalpower consumption is presented in (4.7).

P = PS + PN (4.7)

P , PS and PN are modelled as random variables. The components may refer tovery different things depending on what is being exploited. In general, PS de-pends on what the microcontroller is doing and the data that is being manipu-lated. Furthermore, it will depend on what power model is being used. In abinary power model the leakage signal consist of the contributions of one bit,while the other bits belong to the noise signal.

4.5.1 Noise

The noise component contains all contributions to the power consumption thatare not of interest during power analysis. This section lists some sources of noisethat are independent of the choice of power model.

Electronic Noise

In this thesis, electronic noise is used to refer to all noise components derivingfrom electronic components. This includes thermal noise, inherent to all elec-tronic devices and other device specific sources such as shot noise. Electronic


noise is caused by all components involved such as the transistors in the micro-controller, wiring on the printed circuit board (pcb), measurement equipment,decoupling capacitors et cetera. It is mentioned earlier that detailed models ofthe microcontroller are difficult to acquire, especially from an attacker’s point ofview, and often unnecessary. A detailed analysis of the noise is therefore similarlyuninteresting in the scope of this thesis. The electronic noise is simply assumedto be normally distributed with a zero mean.

Quantization Noise

Converting a continuous, analogue signal to a number of discrete digital values(sometimes referred to as code centres) is called quantization. Since the possibleoutput values are limited the real signal amplitude will be rounded to the nearestcode centre. The quantization noise depends on the number of bits of the quan-tizer and in this thesis the noise contribution of the quantizer is assumed to besmall.

4.6 Signal-to-Noise Ratio

The signal-to-noise ratio (snr) is a relative measurement used to quantify thequality of a noisy signal. The higher the snr, the stronger the signal.

Definition 4.4 (Signal-to-noise ratio). The snr between a signal with power PSand the noise with power PN is defined as

SNR =PSPN

or alternatively

SNR =σ2S

σ2N

where σS and σN are the standard deviations of the signal and noise, respectively,assuming the noise has a zero-mean.

In the context of power analysis the signal corresponds to the exploitable infor-mation available in a power measurement (i.e. the leakage signal). Knowing thesnr is important and can help in determining the complexity of a power analysisattack and how likely it is to succeed.

4.6.1 Calculating the Signal-to-Noise Ratio

The snr can be determined experimentally or through careful simulations. Inthis thesis the premise is that the knowledge about the microcontroller’s physicalimplementation is low and the snr must be estimated experimentally.

First, some microcontroller instruction is selected. It is the snr of the leakagefrom this instruction that is calculated. Second, the electrical noise variance iscalculated by measuring the power consumption while repeatedly executing the

4.6 Signal-to-Noise Ratio 29

instruction with a constant input, i.e. the only variations in the measurementsare caused by noise. Finally, the variance of the leakage signal is estimated byvarying the input to the instruction. The result depends on the power model asexplained below.

Hamming Weight Power Model

The variance of the leakage signal is estimated by executing the instruction us-ing random inputs with the same Hamming weight many times and calculatingthe mean of the power consumption. This is repeated for all Hamming weights.These mean power consumption values can be seen as measurements of the leak-age signal PS . The sample variance of these mean values gives an estimate to σ2

S .

Binary Power Model

In a binary power model only one bit per byte contributes to the leakage. Thismeans that the power consumption caused by the other seven bits is consideredas noise. The variance of the leakage signal is estimated similarly to the Ham-ming weight case by measuring the power consumption while the instructionmanipulates random data. Assuming that the lsb model is used, the mean ofall measurements where the lsbs are one is calculated and repeated for the casewhere the lsbs are zero. The sample variance between the two means gives anestimate to σ2

S .

A crude approximation of the total noise variance can be acquired by assumingthat all bits contribute equally to the power consumption, the snr can thereforebe calculated as:

SNR =σ2S

σ2N + 7σ2

S

(4.8)

5Power Analysis

Power analysis is the category of side-channel analysis where the power consump-tion of a cryptographic system is studied with the goal of extracting sensitiveinformation such as encryption keys. The first two types of power analysis pre-sented are simple power analysis (spa) and differential power analysis (dpa). Thefirst article on power analysis attacks was published by Kocher et al. in 1999where an attack on the des block cipher was presented [6].

spa constitute visual inspection of the power consumption while dpa attacks arefocused on the processing of large numbers of measurements. dpa attacks are socalled non-profiling attacks employed by weak adversaries—“weak” in the sensethat the adversary has very little knowledge about the system. A strong adversaryhas the option of using profiling attacks where the target system is carefully anal-ysed before the attack is executed [24]. One such attack is the template attackpresented at the end of this chapter.

5.1 Power Traces

The power trace is the basic building block of all power analysis attacks. A powertrace is a vector of measurements proportional to the power consumption of thecryptographic device, in this case a microcontroller. The microcontroller leaks in-formation on its internal state through its power consumption and the purpose ofthe power trace is to capture this leakage. As stated in chapter 4 the actual powerconsumption is generally not measured, but simply some physical quantity thatis proportional to it.

31

32 5 Power Analysis

5.1.1 Number of Sample Points

An important aspect of the power trace is its length. The length depends on themeasuring equipment’s sample rate as well as how much time the power traceshould cover. There is at least one instruction during an aes encryption that leakssensitive information. This instruction is executed during one or more clock cy-cles. Therefore, it is necessary to sample the power consumption at least once perclock cycle. Expressed in another way: the sampling frequency should be higherthan the microcontroller’s clock frequency. In the case of a digital sampling oscil-loscope the sampling rate is often many times larger than required, which meansa lot of redundancy is included in every power trace. While not harmful to theresults, this redundancy adds to the computational complexity of an attack.

5.2 Simple Power Analysis

In spa the attacker studies one single power trace, or possibly an average of mul-tiple power traces of the same task. The goal is then to visually inspect the powertrace and try to find patterns that may leak information of the device’s operation.Often, operations such as branching and loops can be detected through spa. Inthe case of aes the available information is limited in a spa trace and in generaldpa is preferred. Still, spa is still an important tool as sometimes it is simplynot possible to acquire the number of power traces required to execute a dpa at-tack. For instance, the microcontroller may limit the number of encryptions perkey. spa is often used as a tool to determine properties about the target that mayprove helpful in a dpa attack.

5.2.1 Attacking RSA

To illustrate the principles of spa an attack against the rsa public-key encryp-tion scheme is presented. In rsa both encryption and decryption uses modularexponentiation. Depending on how the modular exponentiation is implementedit may be susceptible to spa attacks [14]. A straightforward software implemen-tation of the operation ak (modN ) is shown in algorithm 2, where a and N areintegers and k is an n-bit binary number such that k = kn−12n−1, . . . , k020. The im-portant part is the if-statement that switches on the bits of the exponent. Thisexponent constitutes the private key during decryption. The power trace willreveal a pattern of similar looking operations corresponding to each iteration ofthe algorithm. Some iterations will be slightly longer then others due to the extramultiplication indicating that the key bit is one. This way, the entire exponentcan be read directly from the power trace.

5.2.2 Attacking AES

Unfortunately, aes and similar block ciphers are not immediately susceptible tospa as there are no key-dependent operations or branches. That being said, whilevisual inspection may not yield the secret key, other important information couldbe available. If the cipher is unknown the power trace can be used to identify the

5.3 Differential Power Analysis 33

Algorithm 2: Left-to-right binary modular exponentiation.

Input: Integers a,N and bits k0, . . . kn−1 such that k = kn−12n−1, . . . , k020

Output: ak (mod N )p← 0for i← n− 1 down to 0 do

p← p2 (mod N )if ki = 1 then

p← p × a (mod N )end

endreturn p

number of rounds and the execution time of the algorithm. This can help indetermining e.g. key length.

One possible usage of spa in relation to aes is to reduce the complexity of a bruteforce attack by determining the Hamming weights of the sub-keys (sub-keys arementioned in section 3.3.1). The power consumption of all manipulations of asub-key ki will depend on HW (ki) and if this value can be identified for all sub-keys the search space can be significantly reduced. The search space of a regularbrute force attack contains 2nk elements for a secret key consisting of k sub-keys,where each sub-key is n bits long. Thus, for aes with 128-bit keys there are 2128

possible values. The average number of elements Navg in the search space for abrute force attack on a key where the Hamming weight of each sub-key is knownis presented in equation (5.1) [15].

Navg =

n∑m=0

(nm

)2

/2nk =

n = 8

k = 16

=

8∑m=0

(8m

)2

/ 256

16

=((

168

)/ 256

)16

(5.1)

A major issue with spa is that it is highly sensitive to noise, especially in the casewhere few power traces are available. Determining the Hamming weight of avalue through visual inspection can therefore be hard, if not impossible.

5.3 Differential Power Analysis

dpa refers to a family of power analysis attacks that employ statistical methodsto analyse large sets of power traces. Different variants use different statisticalmethods. Furthermore, the choice of power model may limit the available meth-ods. In contrast to spa where noise is a matter of great concern, in a dpa attack thenoise is averaged out by the large number of measurements. In theory, as long asthere is some information leakage it does not matter how noisy the signal is. Itis simply a matter of recording enough power traces. In section 5.3.1 a summaryof a dpa attack is presented and some notation is introduced. The rest of thissection discusses the various statistical methods employed, with a focus on the

34 5 Power Analysis

correlation coefficient attack.

5.3.1 General Approach

The goal of any dpa attack is to establish a relationship between the power con-sumption and the encryption key. To do this the power consumption is hypothe-sised (or modelled) with respect to a key guess. While the methods to select likelykey candidates differ the overall approach of a dpa attack is the same. The stepsinvolved are listed below:

1. Measure the power consumption when encrypting a (large) number of ran-dom plaintexts.

2. Choose an intermediate value to attack and compute hypothetical interme-diate values.

3. Compute the hypothetical power consumption with the chosen power model.

4. Compare the power traces with the hypothetical power consumption valuesusing some statistical method.

The rest of this section describes each step in detail. The attacked algorithm isaes-128 but only small modifications are required in order to attack larger keysor even other ciphers, see section 5.3.6. Additionally, the process of attackingdecryption rather than encryption is effectively the same.

Capture Power Traces

The power traces are collected while encrypting N plaintexts. The plaintext bitsshould be random and uniformly distributed. This makes it possible to studythe power consumption of every bit (or byte) individually. All bits other than theones of interest are regarded as noise as explained in section 4.6.1. The plaintextsare saved in the plaintext matrix P ∈ MN×16 where each row represents a 16-byte plaintext. A power trace is then captured for each plaintext and stored inthe power consumption matrix, denoted as T ∈MN×S where S is the number ofsamples per trace. Capturing the entire encryption is not necessary; it is sufficientto sample when the intermediate value leaks. However, this requires knowledgeabout the implementation. Let τ denote the point in time (or sample point) wherethe intermediate value leaks. For a dpa attack to be successful the power tracesmust satisfy 0 ≤ τ < S.

Intermediate Value

The point of a power analysis attack is to extract some information about the in-ternal state of an algorithm. In the case of aes the goal is to determine the secretkey. Therefore, the intermediate value should be directly related to the key. Fur-thermore, the other contributions to the intermediate value must be known, ormust be easily calculable from some known value. More generally, the interme-diate result z can be expressed as z = f (x,y) where y is a part of the key and x isthe known value. Recall from section 3.3.2 that both AddRoundKey and SubBytesare performed bytewise. Conveniently, this means that each byte in the aes state


can be studied individually with the consequence that every byte in the key canbe broken independently of the others. Let k = [ k0 k1 ... k15 ] denote the secret keywhere ki is the ith sub-key. Additionally, let p = [p0 p1 ... p15 ] denote the plain-text input. Thus, the intermediate value zi corresponding to the ith sub-key iscalculated as zi = f (pi , ki).

A common selection when attacking aes is the output of the first SubBytes trans-formation because its input is the raw key xored with the plaintext. The calcu-lation of the intermediate value is shown in equation (5.2) where S is the aes

S-box.

zi = f (pi , ki) = S(pi ⊕ ki) (5.2)

There is a reason for choosing the output of SubBytes rather than the input andthat is that the S-box actually improves the results due to its non-linearity, whichhelps in removing false positives.

Hypothetical Intermediate Values

The first step is to select a sub-key and make a guess on the correct value, i.e.fix ki to some value k where 0 ≤ k < 256. The next step is to evaluate f for allplaintexts, or more specifically the column P•i . This results in a vector with Nelements. This process is repeated for all possible values of k, finally yielding amatrix Z ∈MN×256. The calculation of Z is shown in equation (5.3) where i is theindex of the sub-key, n is the index of the plaintext and k is the key guess.

zn,k = S(pn,i ⊕ k) (5.3)

Note that Z only holds hypothetical intermediate values corresponding to thesub-key ki . Hence, this step and all subsequent steps must be repeated for everysub-key, i.e. 16 times.

Hypothetical Power Consumption

Before a statistical comparison can be made the hypothetical intermediate valuesmust be mapped to hypothetical power consumption values. This is done byapplying a power model g to Z resulting in another matrix H as shown in (5.4).H is called the hypothetical power consumption matrix.

hn,k = g(zn,k) (5.4)

For example, if the Hamming weight model is used then hn,k = HW(zn,k). It isimportant to realise that there is no concept of time in the hypothetical powerconsumption matrix. The columns in H correspond to key guesses, i.e. H•k mod-els the power consumption for a single point in time τ based on the assumptionthat ki = k.

Statistical Comparison

The power traces are compared with the hypothetical power consumption througha statistical method. While the methods vary they always produce results in thesame form. In general, for each sample point all of the power traces are comparedwith the hypothetical power consumption values for the different key hypotheses.

36 5 Power Analysis

To clarify, compare every column in H with every column in T. The result is a sin-gle value and corresponds to an element in the result matrix. The result matrixis denoted R ∈M256×S . If the comparison function is denoted as Comp then R isconstructed by calculating

rk,l = Comp(H•k ,T•l)

for all k and t such that 0 ≤ k < 256, 0 ≤ l < S. The final step is to identify themaximum absolute value in R. The column that holds this value represents thepoint in time where the studied intermediate value leaks. The row correspondsto the best key hypothesis. Denote the correct sub-key as κ. If the attack issuccessful then max|R| = rκ,τ .

The following made up example illustrates how the result matrix is constructedfrom a power consumption matrix and a plaintext matrix. For brevity, only onekey guess is made for one sub-key. In general, the power consumption matrix ismuch larger.

Example 5.1: Differential power analysisT contains measurements of the power consumption during the first SubBytestransform of the first sub-key k0. In total, five encryptions are performed and P•0holds the first byte of each plaintext.

P•0 =

19

1314

18206

, T =

−0.10 −0.96 0.85 0.13−0.84 0.11 0.02 0.100.03 0.34 0.44 0.730.17 0.03 −0.18 0.74−0.53 0.48 0.56 0.69

A guess is made, hypothesising that k0 = 74. Thus, the 74th column in Z is calcu-lated by evaluating S(pi,0⊕74) for all 0 ≤ i < 5. By applying the Hamming weightpower model the 74th column in H is calculated.

Z•74 =

S(19⊕ 74)S(131⊕ 74)S(4⊕ 74)S(18⊕ 74)S(206⊕ 74)

=

20322147

10695

⇒H•74 =

HW(203)HW(221)HW(47)HW(106)HW(95)

=

56546

In the next step the first column in T is compared with H•74 using some statisticalmethod Comp such that r74,0 = Comp(H•74,T•0). This is then repeated for everycolumn in T. Row R74• now contains the result from a dpa attack on the firstsub-key using the key guess 74.

5.3.2 Difference of Means

The original dpa attack uses a simple test to determine whether there are anysignificant differences between the means of two distributions [6]. By applying abinary power model the power traces can be divided into two sets. The difference


between the means of these sets is then calculated which yields something calleda dpa bias signal. The following steps describe the process:

1. Select a key hypothesis, k, i.e. choose a column H•k .

2. Put trace Tn• in the first set if hn,k = 0.

3. Otherwise, put the trace in the second set.

4. Calculate the difference between the mean of the traces in the first set andthe mean of the traces in the second set.

5. The result, called the dpa bias signal, corresponds to row Rk•.

Alternatively, this can be written as in equation (5.5) where k is the key hypothesisand l is the sample point.

rk,l =∑N−1n=0 hn,ktn,l∑N−1n=0 hn,k

−∑N−1n=0 (1− hn,k)tn,l∑N−1n=0 (1− hn,k)

(5.5)

For example if the lsb of the intermediate value is used as the power model, thenwhen this bit is manipulated by the microcontroller the power consumption isdifferent depending on whether the lsb is one or zero. Subsequently, there shouldbe a difference between the means of the two sets of power traces if they have beendivided correctly, i.e. the key guess is right. On the other hand, if the key guess iswrong then the probability of placing any given power trace in the first set is 1/2.In this case, there is no discernible difference between the means of the two sets.

5.3.3 Distance of Means

Similarly to the difference of means test, the distance of means method appliesa binary power model to divide the power traces into two sets but now the vari-ances are also taken into account. The distance of means is very similar to thehypothesis test to check if there is a difference between the means of two distri-butions [13]. Equation (5.6) shows how each element in R is calculated.

rk,l =

∑N−1n=0 hn,ktn,l∑N−1n=0 hn,k

−∑N−1n=0 (1−hn,k )tn,l∑N−1n=0 (1−hn,k )

sk,l(5.6)

At any point in time l the measurements can be seen as observations of two dis-tributions. s2k,l is an estimation of the pooled variance of these distributions.

5.3.4 Correlation Coefficient

The power models presented in section 4.4 all assume a linear relationship be-tween an intermediate value and the power consumption of the microcontroller.An alternative to the earlier tests is to calculate the correlation between the hypo-thetical power consumption and the measured values. This is sometimes calledcorrelation power analysis and was presented in 2004 together with the Ham-ming distance leakage model [16]. The idea is that there is a strong correlationfor the correct key, while there is practically no correlation at all when the key

38 5 Power Analysis

guess is wrong keys. Furthermore, there is no correlation at the points in timewhere no leakage occur. The matrix R is constructed by calculating the samplecovariance (see appendix A.1) of column k from H and column l from T accordingto equation (5.7).

rk,l =∑N−1n=0 (hn,k −H•k)(tn,l −T•l)√∑N−1

n=0 (hn,k −H•k)2√∑N−1

n=0 (tn,l −T•l)2(5.7)

An important observation is that equation (5.7) places no requirements on thepower model except that it must be linear. That being said, the choice of powermodel will of course have an impact on the quality of the results.

5.3.5 Number of Traces

Being able to estimate the complexity of an attack is of great interest, both froman attacker’s and from a designer’s perspective. There is no point in collectingmore power traces than necessary to break the key and calculating the requirednumber of measurements beforehand may save a lot of time. Below follows a de-scription of how the snr of the system can be related to the correlation coefficientand how to map the correlation coefficient to an estimation of how many powertraces are needed in order to successfully break the key.

Relating the SNR to the Correlation Coefficient

For a dpa attack based on calculating the correlation, an element rk,l ∈ R is anestimate of the true correlation coefficient. LetH denote a random variable repre-senting the hypothetical power consumption values for the correct sub-key. Fur-thermore, let PS be the leakage signal and PN the noise signal as described insection 4.5, with standard deviations σS and σN , respectively. The maximum cor-relation is therefore ρmax = ρH,P = ρH,PS+PN . Intuitively, the more power tracescollected, the closer the estimated correlation coefficient for the correct sub-keyis to ρmax while all others will tend towards zero. Through its definition, the cor-relation coefficient can be rewritten to depend on the snr of the measured powersignal [17]. This is shown in equation (5.8).

ρmax = ρH,PS+PN =Cov (H,PS + PN )√

σ2Hσ

2PS+PN

=E (H (PS + PN ))−E (H)E (PS + PN )√

σ2Hσ

2PS+PN

=E (HPS ) +E (HPN )−E (H)E (PS )−E (H)E (PN )√

σ2Hσ

2S

(1 +

σ2N

σ2S

)=

ρH,PS√1 + 1

SNR

(5.8)


Equation (5.8) effectively states that the higher the snr is the larger the correla-tion will be. Unfortunately, calculating ρH,PS is non-trivial but it is possible toestimate it given enough knowledge about the target system. A worst case sce-nario (from the designer’s perspective) can be simulated by setting ρH,PS = 1 andis how ρmax is estimated in this thesis.

Relating the Correlation Coefficient to the Number of Traces

Mangard notes that in order to attain a lower bound of the amount of traces itis sufficient to test whether there is a significant difference between a distribu-tion with ρ = ρmax and a distribution with ρ = 0 [17]. This is reasonable be-cause, as mentioned in section 5.3.4, the correlation will be very small for wrongsub-keys. The sampling distributions can be approximated using the Fisher z-transformation described in appendix A.1.3. Call these sampling distributionsR0 and R1 such that

R0 ∼N(

12

ln(

1 + ρmax1− ρmax

),

1N − 3

)and

R1 ∼N(1

2ln

(1 + 01− 0

),

1N − 3

).

The next step is to determine how many samples that are required to detect adifference between the two normal distributions R0 and R1. This is described inappendix A.1.2 and the result is shown in (5.9) where α denotes the confidencelevel corresponding to the quantile zα .

N = 3 + 8

zα

ln(1+ρmax

1−ρmax)

2

(5.9)

5.3.6 Notes on Key Length

The described approach works on aes with 128-bit (16 bytes) keys. However, itdoes not directly apply to larger keys. The reason for this is that before the firstsubstitution box the first 16 bytes of the expanded key (described in section 3.3.3)are combined with the first 16 bytes of the input data. In the case of 128-bit keysthe first 16 bytes of the expanded key are exactly the key. For 192-bit and 256-bitkeys this is not the case; the first 16 bytes of the extended key are now only part ofthe entire key. In both cases the entire key still occupies the first 24 and 32 bytesof the expanded key, respectively, and an iterative approach is employed. First,the initial round key is broken through a standard dpa attack. The result is thenused to construct hypothetical intermediate values for the second round and theattack is repeated. This also means that the attacker must capture power tracesof the first two aes rounds.

Attacking decryption is harder when the key length is longer than 128 bits if thestraightforward inverse cipher is used. Recall figure 3.1 and follow the flow back-wards. The second round key is mixed in before the first InvMixColumns transfor-

40 5 Power Analysis

mation, which means that every byte in the output from the second InvSubBytesdepends on four bytes of the second round key. Hence, at first glance it wouldseem that 232 key guesses have to be made for every sub-key of the second roundkey rather than 28. However, the linearity of InvMixColumns can be exploitedto solve this [18]. Call the state after the first round of decryption S14 and afterthe second round of decryption S13. Let c denote the ciphertext. Equation (5.10)shows the first round of decryption and illustrates that the round key r14 can bebroken as usual.

S14 = InvSubBytes(InvShiftRows(AddRoundKey(c,r14))) (5.10)

In (5.11) the calculation of S13 is rewritten so that a temporary 16 byte vector t isattacked instead of the actual round key. This is possible due to the linearity ofInvMixColumns and InvShiftRows.

S13 = InvSubBytes(InvMixColumns(InvShiftRows(AddRoundKey(S14,r13))))

= InvSubBytes(InvMixColumns(InvShiftRows(S14 ⊕ r13)))

= InvSubBytes(InvMixColumns(InvShiftRows(S14))

⊕ InvMixColumns(InvShiftRows(r13)))

= InvSubBytes(InvMixColumns(InvShiftRows(S14))⊕ t)(5.11)

The actual round key can now be determined through equation (5.12).

r13 = MixColumns(ShiftRows(t)) (5.12)

The only difference from the standard dpa attack is that the targeted intermediatevalues are somewhat different—when attacking r14 InvShiftRows must be consid-ered and S14 is used to construct the hypothetical intermediate values requiredto break t.

5.4 Template Attacks

Template attacks constitute a separate class of power analysis attacks and wasintroduced by Chari et al., who described the attack as: “the strongest form ofside-channel attack possible in an information theoretic sense” [19]. The conceptis described as:

1. Acquire a microcontroller identical to the one you plan to attack.

2. Measure the power consumption of your dummy target using a known key.

3. Use the acquired power traces to build templates of the real target.

4. Measure the power consumption of the real target and compare with thetemplates.

The templates effectively constitute fairly accurate power models of the device.That being said, the previous models are not completely discarded as the adver-

5.4 Template Attacks 41

sary still has to decide what to model. Depending on how detailed the templatesare, and on how much information the microcontroller leaks only a small numberof power traces are needed. In theory, a single trace is sufficient. The challengelies in the construction of templates, which is both time consuming and compu-tationally cumbersome. Other practical problems include acquiring an identicaldummy target that you can control to manually set the key and the fact that thetemplates cannot be reused for other devices.

Templates are not limited to attacking encryption algorithms. A possible scenariocould be to create a template for every instruction implemented in the microcon-troller. The templates can then be used to reverse engineer the program runningon the microcontroller without accessing the program memory. This thesis fo-cuses on building templates for the sole purpose of attacking aes.

5.4.1 Multivariate Gaussian Model

Again, it is assumed that the microcontroller’s instantaneous power consumptionis normally distributed. An alternative to studying single points in time, as in dpa,is to look at multiple points, i.e. the entire power trace. Consider the matrix ofpower measurements T and let the columns represent random variables of thepower consumption at different times while the rows correspond to observationsof these random variables. A template is fully described the mean vector µ andthe covariance matrix Σ of T (see appendix A.1.4) and is denoted as the pair(µ,Σ) [19].

5.4.2 Template Building Phase

The first step in the template building phase is to decide which operations tomodel. In this case “operations” refer to whatever the attacker is interested in. Itcould for instance be processor instructions such as an addition or a load frommemory, but it could just as well be data rather than instructions which is moreinteresting in our case as being able to identify different instructions does notreally help when attacking aes. Therefore, the goal is still to determine some in-termediate value, specifically the output of the first SubBytes. The obvious waywould be to select 256 operations: one for every possible byte value and build256 templates. This is quite a lot of work and does not make much sense if themicrocontroller only leaks the Hamming weight [20]. Furthermore, this has to berepeated for every sub-key which increases the number of templates by a factorof 16. From here on, consider only the sub-key ki , and the targeted intermediatevalue is, as before, zi = S(p0 ⊕ ki). Now, nine operations are chosen—one for eachHamming weight of the intermediate value. To clarify, the first template mod-els the power consumption of the microcontroller when HW(zi) = 0, the secondmodels HW(zi) = 1 and so on. The templates are then constructed as follows:

1. Collect a large number of power traces of each operation, resulting in powermeasurement matrices T1, . . . ,T9.

2. Optionally select a subset of samples from each trace to reduce attack com-plexity.

42 5 Power Analysis

3. Calculate the mean of the power traces for each operation, resulting in thevectors µ1, . . . ,µ9.

4. Calculate the covariance matrix Σi for every power consumption matrix Ti .

The ith template is denoted as (µi ,Σi).

5.4.3 Attack Phase

The attack starts out similarly to a normal dpa attack by collecting a number ofpower traces of encryptions using random and known plaintexts. Assume onepower trace t is collected. The problem now lies in trying to determine whichone of the 256 possible sub-keys would cause a power consumption signaturethat looks like t. The sub-key can be viewed as a random variable K and the goalis to calculate the conditional probability Pr(K = x|t) for all x ∈ [0,255]. This canbe rewritten using Bayes’ theorem and is presented in (5.13) [20].

Pr(K = x|t) =Pr(K = x)Pr(t|K = x)∑255l=0 Pr(K = l)Pr(t|K = l)

(5.13)

Extending (5.13) to multiple power traces is straightforward as traces are in-dependent so the probabilities are simply multiplied together and the result isshown in equation (5.14).

Pr(K = x|T) =Pr(K = x)

∏N−1i=0 Pr(Ti•|K = x)∑255

l=0 Pr(K = l)∏N−1i=0 Pr(Ti•|K = l)

(5.14)

The probability Pr(K = x) is easy to estimate as generally the adversary does nothave any information about the key. Therefore, every sub-key is assumed to beequally likely and in the case of aes it is simply 1

256 for all x ∈ [0,255]. The tem-plates are used to determine the conditional probabilities Pr(t|K = x) for all possi-ble sub-keys and for all power traces. Template selection is based on hypotheticalintermediate values. If p is the plaintext corresponding to power trace t then forthe hypothesis that K = x the template (µi ,Σi) is chosen where i = HW(S(p0 ⊕ x)).The probability is calculated by inserting t and (µi ,Σi) into (A.13), resulting inequation (5.15).

Pr(t|K = x) =1√

(2π)N |Σi |exp

(−1

2(t−µi)TΣ−1

i (t−µi))

(5.15)

The x that yields the largest probability in (5.14) is the most likely value of thesub-key.

5.4.4 Points of Interest

A problem with the template building is that the covariance matrix grows quadrat-ically in relation to the number of points in the power trace and it will quicklybecome too large to handle efficiently. Furthermore, equation (5.15) includes aninversion of the covariance matrix and there is a risk of running into numericalissues as the eigenvalues of the covariance matrix are often close to zero. One pro-posed method is to let the identity matrix represent the covariance matrix. This

5.4 Template Attacks 43

is called a reduced template attack [20].

Selecting points of interest remains an open problem and there is no optimalsolution that fits all attack scenarios. The commonly suggested approach is tocalculate the sum of the difference between each mean vector pair, i.e.

n−1∑i=1

n∑j=i+1

(mi −mj )

and choose some arbitrary number of points where large differences occur [19][21].

6Countermeasures

Various countermeasures have been suggested to prevent both simple and differ-ential power analysis. They mainly fall into two categories: hiding and masking,discussed in sections 6.1 and 6.2, respectively.

6.1 Hiding

Hiding countermeasures attempt to make an attack harder by hiding the leakagesignal from the adversary. In general, there are two types of hiding: one typeaffects the amplitude dimension while the other affects the time dimension [13].Ideally, the power consumption should be constant or, alternatively, completelyrandom. This is, of course, very hard to achieve in practice.

6.1.1 Amplitude Hiding

The point of amplitude hiding is to make the snr as low as possible either bymaking the leakage signal smaller or by increasing electronic noise. This is gener-ally hard to achieve trough software modifications and often require some formof hardware solution. For instance, one could add extra circuitry such as an-other microcontroller hooked up to the same power supply that performs other,unrelated computations in parallel to the encryption or adding hardware noisegenerators. The snr will be significantly worse but given enough power tracesthe noise can be averaged out. Another way of hiding the intermediate values isto use special logic styles (called dual-rail pre-charge logic) when designing thecryptographic device, however, this requires designing the microcontroller fromscratch and does not apply to the scenario in this thesis.

45

46 6 Countermeasures

Detached Power Supplies

Shamir suggests a method of detaching the power supply from the power con-sumption of the chip on a smart card by using two capacitors that alternatesbetween being connected to the power supply and the chip [22]. As long as anadversary is unable to remove or bypass the capacitors the power drawn from theexternal supply will be completely uncorrelated with the power consumed by themicrocontroller.

6.1.2 Time Dimension Hiding

The second kind of hiding is more suitable for software implementations. Poweranalysis attacks, spa in particular, requires some knowledge about when the leak-age occurs. For dpa attacks it is enough to know that the leakage happens some-where within the power trace. Recall that in section 5.3 the intermediate value isassumed to leak at time τ . Furthermore, it is assumed that the power traces arealigned so that τ is the same in all traces. Time dimension hiding introduces ran-domness in the execution path, effectively making this assumption untrue. Twomethods, random delays and shuffling, are presented in sections 6.1.3 and 6.1.4,respectively.

6.1.3 Random Delays

Inserting random delays before, during and after the algorithm causes an effectsimilar to unaligned power traces with the exception that the distance betweentwo instructions may change. The nop instruction causes the microcontroller tostall for one clock cycle and may be used to implement a delay. The drawback ofusing nop is that it is generally very obvious where they are located from lookingat the power traces. This is because no data is manipulated and many parts ofthe microcontroller are inactive during a nop instruction causing the power con-sumption to be lower than usual. Hence, using some form of pre-processing onthe power traces these delays could be eliminated. Another important aspect isthat the total number of delays should be equal every time the algorithm is run,otherwise the number of delays can be inferred from measuring the algorithm’sexecution time [13]. An alternative to using nop instructions is to insert dummyoperations or even entire aes rounds operating on a dummy state [23].

6.1.4 Shuffling

Recall from section 3.3 that some of the transformations in aes treat each statebyte independently from the others. In particular AddRoundKey and SubBytes aregood candidates for shuffling. Randomizing the access pattern of the state bytescomplicates a dpa attack. For instance, the first S-box lookup can now occur inany of 16 possible time slots. Mainly two variants have been suggested for shuf-fled implementations: random start index and random permutation [24]. Thefirst randomizes the starting index when accessing the state array and then con-tinues by incrementing the index by one. This is cheap to implement as only onerandom number have to be generated. However, there are only 16 possible out-

6.2 Masking 47

comes for each run of the algorithm; two encryptions with the same start indexwill have identical power traces (save noise). The other approach is to completelyrandomize the access pattern. This can be implemented by shuffling an array ofindices at the start of each encryption and then use this array to access the state.This results in 16! possible permutations.

Shuffling ShiftRows and MixColumns

Implementing ShiftRows and MixColumns in a shuffled manner is not as straight-forward as for the other transformations. In ShiftRows every row is independentof the others while in MixColumns the same applies to the columns. This meansthat there are four independent operations in both ShiftRows and MixColumnsthat can be shuffled. However, it is possible to fully shuffle each state byte calcu-lation by copying the entire state beforehand.

6.1.5 Attacking Shuffling

Consider a standard dpa attack on the output of the first SubBytes transformwhere the bytes are completely shuffled. There are 16 possible timeslots, τ0, . . . , τ15,where the intermediate value zi may appear. The probability that zi appears attime τj is 1/16. In practice, this means that in approximately N/16 power traceszi will appear in the first timeslot. For all other power traces the value of thefirst timeslot will effectively be noise (with respect to zi). The same applies to theother timeslots and for all other key bytes. Hence, the correlation has not beeneliminated. It has merely been diluted. Instead of one large peak at rκ,τ approxi-mately equal to ρmax there will be multiple peaks in Rκ• with expected values ofρmax/16 [25].

6.2 Masking

While hiding attempts to disconnect the power consumption from the intermedi-ate values, masking countermeasures replaces all intermediate values with ran-dom numbers. This can be achieved using schemes similar to secret splitting aspresented in section 2.6.1. In general, a sensitive variable x can be divided into dshares r1, . . . , rd by generating the random numbers r1, . . . , rd−1, called the masks,and selecting rd so that equation (6.1) is satisfied for some operation . rd is calledthe masked variable [26].

x = r1 · · · rd (6.1)

Whenever an instruction would operate on x it is replaced by operating on eachof the shares separately at different points in time. Most commonly the chosenoperation is boolean xor. This is referred to as boolean masking. The alterna-tive, arithmetic masking, often relies on modular addition or modular multipli-cation. The choice between arithmetic and boolean masking depends on the ci-pher. Sometimes both may be required, however, switching between boolean andarithmetic masks is non-trivial and expensive [27]. This thesis focuses mainly onboolean masking.


To illustrate the idea, consider the function f (x) that leaks information on xthrough the power consumption. Split x into two shares, r1 and r2, such that

r2 = x⊕ r1where r1 is a random byte called the mask. The masked calculation is thus f (r2) =f (x⊕r1). If f is a linear function then f (x⊕r1) = f (x)⊕f (r1). The result is maskedbut it is straightforward to retrieve the desired output by calculating f (r1) andxoring it with f (r2). Since f (r1) and f (r2) are evaluated at different points intime the power consumption of one is not enough to find the variable x becauseeven if x⊕ r1 is determined it reveals no information on x unless r1 is also known.Furthermore, since r1 is different every time f is invoked a standard dpa attack isnot possible. This also highlights the importance of random numbers. If r1 canbe predicted, then intermediate values can be built based on the expected valuesof r1.

The first problem encountered when implementing boolean masking in aes ishow to deal with the S-box and this issue is discussed extensively in literatureon masking. In section 3.3.2 the S-box is described as an affine transformationcombined with an inversion in a finite field making it a non-linear operation.SubBytes must therefore be handled specially. The other problem is to make surethat no intermediate values are unmasked during the algorithm by carefully keep-ing track of how the masks are transformed. A masking scheme fully describeshow to apply masks and how to modify the algorithm structure.

Definition 6.1. A d-th order masking scheme splits every intermediate valueinto d + 1 shares.

6.2.1 Masking the S-box

The aes S-box involves computing the multiplicative inverse which has the conse-quence that boolean masking will not work since (x⊕ r)−1 , x−1⊕ r−1. The first so-lution is to use multiplicative masking instead because (x×r)−1 = x−1×r−1 but, aspreviously mentioned, this forces the implementation to switch between booleanmasks and multiplicative masks. A perhaps not so obvious problem with mul-tiplicative masking is that a straightforward implementation can not mask thevalue zero. Let x = x × r and where r is a random number. If x is zero, then x isalso zero regardless of r and is thus not statistically independent of x [28].

The second solution is based on re-computing the S-box to circumvent the non-linearity issues. The new S-box should accept a masked input and return amasked output. Given two masks r1 and r2 a new S-box S is constructed suchthat equation (6.2) holds for every byte x.

S(x⊕ r1) = S(x)⊕ r2 (6.2)

The masks r1 and r2 may be chosen equal to reduce the number of masks, butthis may have security implications. The disadvantage of this approach is that anew S-box must be generated for every unique mask passed to SubBytes, whichincreases random-access memory (ram) usage significantly.

6.2 Masking 49

The third option provides a way to avoid building a new S-box table for everymask through secure SubBytes calculations. Rather than masking a lookup table,the affine transform and inversion are masked and every S-box substitution isperformed on the fly [29].

6.2.2 Masking Scheme

Numerous masking schemes have been proposed but this thesis focuses on a firstorder scheme, suitable for software implementations using six initial randommasks, presented by Herbst et al. [23]. Since it is a first order masking schemeevery sensitive value is masked by one random byte at the start of encryption. Ide-ally, every intermediate value would be masked by a unique number but this is ex-pensive. Instead, a smaller number of masks is used. To implement MixColumnsefficiently four masks are needed—one for each row in the state. If the same maskis applied to two different rows the masks will cancel each other out and the in-termediate value will leak. Denote these masks asm0,m1,m2 andm3. The outputstate of MixColumns will still be masked such that each row has a unique mask,but the values are different. Since MixColumns is linear the output masks can becalculated beforehand according to (6.3) where mi is the output mask of row i.The S-box as masked by pre-computing a new S-box, S as described in (6.2).

m0m1m2m3

=

2 3 1 11 2 3 11 1 2 33 1 1 2

m0m1m2m3

(6.3)

Mask Propagation

This section shows how the masks propagate and change throughout encryption.Single (′), double (′′) and triple (′′′) prime symbols are used to indicate that thestate has changed. At the start of the algorithm the state is masked row-wise withthe pre-calculated masks m0 to m3. The initial state thus looks like:

s0,0 ⊕ m0 s0,1 ⊕ m0 s0,2 ⊕ m0 s0,3 ⊕ m0s1,0 ⊕ m1 s1,1 ⊕ m1 s1,2 ⊕ m1 s1,3 ⊕ m1s2,0 ⊕ m2 s2,1 ⊕ m2 s2,2 ⊕ m2 s2,3 ⊕ m2s3,0 ⊕ m3 s3,1 ⊕ m3 s3,2 ⊕ m3 s3,3 ⊕ m3

The round keys must also be masked, either by applying the masks directly tothe expanded key or during key expansion. The latter prevents attacks againstthe aes key schedule and is described in section 6.2.3 [30]. The final round key istreated specially so that the final round outputs an unmasked result. The follow-ing matrix shows a round key after masking:

r0 ⊕ m0 ⊕ms r4 ⊕ m0 ⊕ms r8 ⊕ m1 ⊕ms r12 ⊕ m0 ⊕msr1 ⊕ m1 ⊕ms r5 ⊕ m1 ⊕ms r9 ⊕ m2 ⊕ms r13 ⊕ m1 ⊕msr2 ⊕ m2 ⊕ms r6 ⊕ m2 ⊕ms r10 ⊕ m3 ⊕ms r14 ⊕ m2 ⊕msr3 ⊕ m3 ⊕ms r7 ⊕ m3 ⊕ms r11 ⊕ m4 ⊕ms r15 ⊕ m3 ⊕ms


AddRoundKey has no special effect on the masks and the result will simply be:s′0,0 ⊕ms s′0,1 ⊕ms s′0,2 ⊕ms s′0,3 ⊕mss′1,0 ⊕ms s′1,1 ⊕ms s′1,2 ⊕ms s′1,3 ⊕mss′2,0 ⊕ms s′2,1 ⊕ms s′2,2 ⊕ms s′2,3 ⊕mss′3,0 ⊕ms s′3,1 ⊕ms s′3,2 ⊕ms s′3,3 ⊕ms

Due to the properties of the new S-box the masks of the state matrix will changeas shown in the matrix below after SubBytes and ShiftRows.

s′′0,0 ⊕ ms s′′0,1 ⊕ ms s′′0,2 ⊕ ms s′′0,3 ⊕ mss′′1,0 ⊕ ms s′′1,1 ⊕ ms s′′1,2 ⊕ ms s′′1,3 ⊕ mss′′2,0 ⊕ ms s′′2,1 ⊕ ms s′′2,2 ⊕ ms s′′2,3 ⊕ mss′′3,0 ⊕ ms s′′3,1 ⊕ ms s′′3,2 ⊕ ms s′′3,3 ⊕ ms

At this stage MixColumns cannot be immediately applied to the state. An addi-tional step is therefore required where the state is remasked by xoring row i ofthe state with ms ⊕mi .

s′′0,0 ⊕m0 s′′0,1 ⊕m0 s′′0,2 ⊕m0 s′′0,3 ⊕m0s′′1,0 ⊕m1 s′′1,1 ⊕m1 s′′1,2 ⊕m1 s′′1,3 ⊕m1s′′2,0 ⊕m2 s′′2,1 ⊕m2 s′′2,2 ⊕m2 s′′2,3 ⊕m2s′′3,0 ⊕m3 s′′3,1 ⊕m3 s′′3,2 ⊕m3 s′′3,3 ⊕m3

Finally, the MixColumns transform changes the masks back to the initial ones:

s′′′0,0 ⊕ m0 s′′′0,1 ⊕ m0 s′′′0,2 ⊕ m0 s′′′0,3 ⊕ m0s′′′1,0 ⊕ m1 s′′′1,1 ⊕ m1 s′′′1,2 ⊕ m1 s′′′1,3 ⊕ m1s′′′2,0 ⊕ m2 s′′′2,1 ⊕ m2 s′′′2,2 ⊕ m2 s′′′2,3 ⊕ m2s′′′3,0 ⊕ m3 s′′′3,1 ⊕ m3 s′′′3,2 ⊕ m3 s′′′3,3 ⊕ m3

At this point the next encryption round is started directly as there is no need forremasking.

6.2.3 Masking the Key Schedule

Masking may be applied to the key schedule to protect it from power analysisattacks. The goal is to make sure that every round key is masked as described insection 6.2.2 except for the final round key that is masked with ms only. The finalround key is masked so that the output of the last AddRoundKey is the unmaskedciphertext [23].

Initially, the secret key is masked with ms ⊕ mj , where j indicates the jth rowif the key is viewed as a column-major matrix (in the same way as the state).The challenge lies in modifying the key schedule so that every round is correctlymasked and so that no intermediate value is ever unmasked. Let wi denote thecurrent word in the expanded key. The key schedule is modified by first remov-ing the masks m0 through m3 from the previous word wi−1. The previous wordis then xored with wi−Nk as usual but in order to get the desired output an ex-tra xor with ms is required. For the first four bytes in every iteration ms is used

6.3 Higher-Order Differential Power Analysis 51

instead of ms. The final round key is slightly different and every word has tobe handled uniquely. The modified key schedule for 128-bit keys is shown inalgorithm 3. The transform SubWordMasked uses the masked S-box but is oth-erwise unchanged from SubWord. Note that the order of operations matter andexpressions should be evaluated from left to right, otherwise some values mightbe unmasked.

Algorithm 3: Masked key schedule for 128-bit.Input: Secret key k and byte masks ms, ms, m0, m1, m2, m3Output: Masked expanded key w// First round keyfor i← 0 to Nk − 1 do

wi ← k4i ⊕ms ⊕ m0, k4i+1 ⊕ms ⊕ m1, k4i+2 ⊕ms ⊕ m2, k4i+3 ⊕ms ⊕ m3end// All but the last round keyi←Nkwhile i < Nb ×Nr do

t← wi−1 ⊕ m0, m1, m2, m3t← SubWordMasked(RotWord(t))⊕ ci/Nkwi ← t ⊕wi−Nk ⊕ ms, ms, ms, msi← i + 1for j← 0 to 3 do

t← wi−1 ⊕ m0, m1, m2, m3wi ← t ⊕wi−Nk ⊕ ms,ms,ms,msi← i + 1

endend// Last round keyt← wi−1 ⊕ m0, m1, m2, m3t← SubWordMasked(RotWord(t))⊕ ci/Nkwi ← t ⊕wi−Nkwi+1← wi ⊕wi+1−Nkwi+2← wi+1 ⊕wi+2−Nk ⊕ m0, m1, m2, m3wi+3← wi+2 ⊕wi+3−Nk ⊕ m0, m1, m2, m3wi ← wi ⊕ ms,ms,ms,ms ⊕ m0, m1, m2, m3wi+2← wi+2 ⊕ ms,ms,ms,msreturn w

6.3 Higher-Order Differential Power Analysis

Higher-order differential power analysis (hodpa) is a collective name for dpa at-tacks targeting multiple points in time at once. In general, a dth order dpa at-tack breaks a (d − 1)th order masking scheme, however the complexity of theattacks grows exponentially in relation to d [25][31]. The scheme presented in


section 6.2.2 is vulnerable to second-order attacks and an example is found insection 6.3.1.

Recall that in an unmasked implementation the targeted intermediate value ismanipulated at time τ and that the hypothetical power consumption matrix mod-els the power consumption at this point in time. In a masked implementation theintermediate value is split into d shares that are manipulated at different timesτ1, . . . , τd . Thus a first-order attack will fail. If there is some way to combinethe measured power consumption at these different times into one single value,the problem can be transformed into one solvable by a first-order attack. Hence,two problems arise: how should the power measurements be combined and whatintermediate value should be targeted?

Definition 6.2 (Combining function). Let t denote a power trace of a maskedencryption and denote the times when the d shares are being manipulated asτ1, . . . , τd . A combining function C is defined as a function that combines multiplepower measurements into one single value such that t = C(tτ1

, . . . , tτd ).

The goal of the combining function is to calculate a value that is highly correlatedwith the hypothetical intermediate values for the correct key, while being weaklycorrelated if the key guess is wrong [26].

6.3.1 Second-Order Differential Power Analysis Example

The masking scheme presented in section 6.2.2 uses two random masks to con-struct the new S-box and, consequently, every S-box lookup uses the same masks.Consider the intermediate values z0, . . . , z15 produced by the first SubBytes trans-form: zi = S(pi ⊕ ki ⊕ms) = S(pi ⊕ ki)⊕ ms. As usual, it is assumed that the micro-controller leaks the Hamming weight of the intermediate values. Equation (6.4)presents an important relation regarding the Hamming weight of two bits x andy that will prove helpful in deciding the combining function [32].

HW(x⊕ y) = |HW(x)−HW(y)| (6.4)

In general, this does not hold if multiple bits are considered. It does, however,hold for some values and is used as a basis for the attack. Thus, the masks can beeliminated by calculating the absolute difference between the Hamming weightof the two different intermediate values zi and zj , i , j.

|HW(S(pi ⊕ ki)⊕ ms)−HW(S(pj ⊕ kj )⊕ ms)|= HW(S(pi ⊕ ki)⊕ ms ⊕ S(pj ⊕ kj )⊕ ms)= HW(S(pi ⊕ ki)⊕ S(pj ⊕ kj ))

(6.5)

Suppose zi leaks at time τ1 and zj leaks at time τ2 and that this is captured inthe power trace t then tτ1

∝ HW(zi) and tτ2∝ HW(zj ). From equation (6.5) the

following combining function is selected:

C(tτ1, tτ2

) = |tτ1− tτ2|.

6.3 Higher-Order Differential Power Analysis 53

The intermediate value is chosen as z = S(pi⊕ki)⊕S(pj ⊕kj ). In an attack scenariothe exact times τ1 and τ2 are probably not known—if they were there wouldbe no need to collect more than two samples per power trace. It is, however,significantly easier to determine an interval I = x | Ia ≤ x ≤ Ib such that τ1, τ2 ∈ I .The combining function is applied to every pair of points (tk , tl) in t where k, l ∈ Iand k , l resulting in a new power trace t.

Example 6.1Given a power trace with 10 samples t = [ t0 t1 ... t9 ], the interval boundaries Ia = 3and Ib = 6 and the combining function C(x,y) = |x − y| the combined power tracet is constructed as

t =[C(t3, t4) C(t3, t5) C(t3, t6) C(t4, t5) C(t4, t6) C(t5, t6)

]=

[|t3 − t4| |t3 − t5| |t3 − t6| |t4 − t5| |t4 − t6| |t5 − t6|

].

The number of points in the combined power trace depends on the interval lengthn. There are

(n2)

ways to select pairs from I . The total number of points is therefore(n2

)=

n!2!(n− 2)!

=n(n− 1)(n− 2)(n− 3) · · ·1

2!(n− 2)(n− 3) · · ·1 =n(n− 1)

2.

Of interest is that the number of points in the combined power trace grows withthe square of the interval length. Limiting the interval length is therefore impor-tant. This also illustrates the exponential increase in complexity when attackinghigher-order masking schemes. The intermediate value z = S(pi ⊕ ki)⊕ S(pj ⊕ kj )includes two sub keys ki and kj and both must be attacked at the same time.This is problematic as the sub-keys effectively constitute one 16-bit sub-key and,consequently, 65536 (216) key guesses are made. Assuming SubBytes operates onthe state in a consecutive order i and j should be chosen adjacent to each other,e.g. i = 0 and j = 1. This helps to keep the interval short if the starting time ofSubBytes is known.

Special Case

It was noted in section 6.2.1 that choosing the same random value for both S-boxmasks might affect security. In this case it is possible to eliminate the mask usingan intermediate value that only depends on one sub-key [13]. If

S(pi ⊕ ki ⊕ms) = S(pi ⊕ ki)⊕msand

tτ1∝HW(pi ⊕ ki ⊕ms)

tτ2∝HW(S(pi ⊕ ki)⊕ms)

then

|tτ1− tτ2| ∝HW(pi ⊕ ki ⊕ S(pi ⊕ ki)),


and the intermediate value is chosen as z = pi ⊕ ki ⊕ S(pi ⊕ ki). This effectivelycorresponds to the Hamming distance between the input and output of the S-boxin the unmasked case. Combined power traces must still be built but the numberof key guesses is significantly reduced compared to the general case where ms ,ms. Additionally, the interval can be kept smaller as only one S-box must becaptured.

7Method

The practical work of this thesis was divided into a number of parts. First, theattack environment was setup and software was written to communicate with thetargeted microcontroller and the measurement equipment as well as implement-ing aes in software. Second, a small library was written to automate trace captureand to provide easy-to-use utilities to perform power analysis attacks. Third, spaand dpa was used to characterise the microcontroller. More specifically, the vi-ability of the Hamming weight power model was confirmed and the snr wasestimated. Fourth, a number of dpa attacks with different power models and sta-tistical methods where carried out and compared. Finally, a template attack wasexecuted against the first key byte of the unprotected aes implementation.

7.1 Environment Setup

The attack environment constitutes an oscilloscope with various probes, a mi-crocontroller target and a pc. The components communicate with each other asshown in figure 7.1. In short, the procedure below is followed to capture a singlepower trace.

1. The pc arms the oscilloscope.

2. Encryption is initiated by sending a hexadecimal ascii representation of theplaintext to the microcontroller.

3. When encryption starts the microcontroller triggers the oscilloscope.

4. The power trace is read from the oscilloscope.

5. The ciphertext is transmitted to the pc.

55

56 7 Method

PC

Oscilloscope

Target

1

2

3

5

4

Figure 7.1: The attack environment.

The cipher key is initialized to zero when the microcontroller resets but can bemanually set from the pc.

7.1.1 Target

The target is an 8-bit avr microcontroller, model ATmega328P, popular for its lowpower consumption and its low price. It is famous for being the microcontrollerfound in the Arduino uno development board. The implementation chosen forthis thesis is the Multi-Target Victim Board designed for use in power analysisand is released as part of the ChipWhisperer project [33]. The pcb comes fittedwith jumpers to select between high-side and low-side sensing as well as sma con-nectors in an amplifier chain. The microcontroller is clocked with an externaloscillator at 7.37 MHz. The supply voltage is driven by a lab power supply andset to 3.3 V. To help identify the start of encryption a digital pin on the micro-controller is set to go high after receiving the plaintext, then back to low beforesending the ciphertext. This signal acts as the trigger signal for the oscilloscope.

7.1.2 Oscilloscope

The Rigol DS1052E is a digital oscilloscope with a bandwidth of 50 MHz, a resolu-tion of 8 bits and a maximum sample frequency of 1 GHz. As the waveforms areanalysed on the pc the ideal situation would be if the oscilloscope could act as ananalogue-to-digital converter and continuously send the sampled values to the

7.2 AES Implementations 57

pc. This is not possible with DS1052E as it can only transfer the waveform storedin its internal memory and as such the maximum power trace length is limitedby the oscilloscope’s memory. The oscilloscope is therefore set up in single-shotmode to wait for a trigger signal, capture one power trace and then transfer it.Depending on settings the waveform contains either 16384 (16 KiB) or 1048576(1 MiB) sample points (each sample is 1 byte). Due to the memory depth beingmore or less constant there is a trade-off between the sample rate and the lengthin time of the waveform.

7.1.3 Computer

A normal consumer pc provides all the required computing power needed toanalyse the power traces. All results presented in this thesis are recorded onthe same computer. Its specifications are listed in table 7.1. The attack software

Table 7.1: Computer specifications.

Component Name

Processor Intel Core i5-2400 @ 3.1 GHzMemory 8 GBOperating system Windows 8.1

is written in the Python programming language and makes extensive use of theNumPy package. NumPy is a library for scientific computing and makes workingwith multi-dimensional arrays efficient, which is crucial as the matrices involvedin dpa quickly becomes very large. The target and oscilloscope are also controlledthrough Python scripts.

7.2 AES Implementations

All implementations are compiled with the toolchain included in Atmel Studio 6and the code is written in the programming language C. Three versions of aes-128 are implemented. The first one is a naive implementation with no coun-termeasures. The second version extends the first by implementing shufflingin all transformations while the third version implements the first-order mask-ing scheme described in section 6.2.2. To make sure that all versions behave asintended and to detect transmission errors all plaintexts are encrypted with awell-tested third-party encryption library and the plaintext-ciphertext pairs arecompared1. Decryption is not supported. The performance of the three imple-mentations is compared by reading the width of the trigger pulse from the oscil-loscope. The trigger signal is configured to go high when timing starts and lowwhen it is done. The time spent performing the key schedule, encryption andcountermeasure setup are timed.

1The python library is called cryptography and OpenSSL is used as the backend.

58 7 Method

7.2.1 Naive

The implementation is naive in the sense that it completely follows the standardwithout considering any possible side-channel attacks. All transformations are di-vided into discrete functions without much concern for performance. The S-boxis stored in the microcontroller’s read-only memory (rom) as an array with 256 el-ements. Similarly, the multiplications by two and three required in MixColumnsare also stored as arrays. The key schedule is executed as soon as the key is setand the expanded key is stored in the microcontroller’s ram.

7.2.2 Shuffling

Shuffling is implemented with random permutations as described in section 6.1.4.When the microcontroller is reset an array holding the values 0 to 15 is initialized,corresponding to the state indices, called the index array. This means that when-ever a state byte needs to be manipulated it is accessed indirectly through this ar-ray. At the start of every encryption the index array is permuted randomly usingthe Fisher-Yates shuffle, shown in algorithm 4 [34]. AddRoundKey and SubBytesare trivially changed by indirectly accessing the state bytes through the array.Both ShiftRows and MixColumns copies the entire state array to temporary vari-ables. The order of copying is also random and decided by the index array. Theshuffled implementation does not change the key schedule and as such it is im-plemented and executed in the same way as in the naive version. It should alsobe noted that shuffling is only implemented in the first and the last rounds. Thisis motivated by the fact that only the first and last rounds are sensitive to theattacks presented in this thesis. It is possible to chose an intermediate value fromother rounds but these values will depend on multiple key bytes thanks to theconfusion properties of aes.

Algorithm 4: Fisher-Yates shuffleInput: List a with n elements such that a = a0, . . . , an−1for i← n− 1 down to 1 do

j← random integer ∈ [0, i]swap aj with ai

end

7.2.3 Masking

Due to the behaviour of the masking scheme presented in section 6.2.2 fairlyfew modifications are required. Initially, every row in the state is masked with aunique random value and a masked S-box is initialized and stored in ram. Sincethe entire key schedule is masked and the masks are regenerated before every en-cryption, key expansion occurs just before the state is masked. Additionally, therequired remasking is inserted after ShiftRows in all but the final round. Two vari-ants are implemented, one where the input and output of the S-box are maskedwith the same value, i.e. ms = ms, and one where ms , ms.

7.3 Simple Power Analysis 59

7.2.4 Random Number Generation

The prng is seeded once when the key is set. Few sources of entropy are availableso the seed is produced by a 16-bit counter that is started on reset and then readwhen the microcontroller receives the key. The counter frequency is one eighthof the microcontroller’s clock frequency.


No actual spa attack is executed with the intention of breaking the key. Instead,it is used in preparation for dpa attacks. The intermediate value targeted in mostattacks is the output of SubBytes in the first round. This value is further manip-ulated in ShiftRows where it is copied and moved to a different memory locationand in MixColumns where it is used in calculations. The diffusion properties ofaes makes sure that the correlation between the state bytes and the intermediatevalue diminishes quickly. Hence, it is sufficient to capture the power consump-tion during the first round. spa is employed to identify the “best” oscilloscopesettings.

7.4 Device Characterisation

In this section the method to determine the best measurement setup is described.Furthermore, power traces of encryptions are studied to determine the validityof the Hamming weight model as well as to estimate the system’s snr.

7.4.1 Measurement Configuration

Five different measurement configurations are tested. High- and low-side sensingare paired with either an sma cable or a standard 10x/1x probe that comes withthe oscilloscope. Additionally, an H-field probe combined with a 20 dB low-noiseamplifier (lna) is tested. Measurements of the H-field are highly dependent onthe placement of the probe. The probe can effectively cover about one fourth ofthe microcontroller but precise positioning is difficult. Two positions are tried,one where the probe is placed on the “upper” half of the microcontroller and onewhere it is placed on the “lower” half, see figure 7.2. The best configuration is

Figure 7.2: Approximate H-probe positions as seen from above the micro-controller.

60 7 Method

evaluated by comparing the maximum correlation coefficient achievable with thedifferent configurations based on dpa attacks using 5000 power traces and theHamming weight model. This is made under the assumption that the Hammingweight model is correct, even though it has not been confirmed yet.

7.4.2 Viability of the Hamming Weight Model

The basic assumption relied upon is that the microcontroller leaks the Hammingweight. For this to be true, there should be a noticeable difference in the powerconsumption when two bytes of different Hamming weights are manipulated. Incontrast, there should not be any difference between different values with thesame Hamming weight, or at least the difference should be small compared tothe difference when the Hamming weights are different. To assert that this is thecase the same instruction is run 1000 times for each Hamming weight. The differ-ence is then determined by averaging the power traces corresponding to the sameHamming weight and comparing the averages. Checking if two different valueswith the same Hamming weight have the same power consumption is done witha difference between means hypothesis test. Two intermediate values are chosen:116 and 170. Both have a Hamming weight of four. The power consumptionof the two values are represented as two normally distributed random variables.The null hypothesis is that there is no difference between the power consump-tion of the two values. Standard two-tailed t-tests are performed with a differentnumber of samples.

Target Instruction

The microcontroller instruction that leaks the intermediate value can be inferredfrom either the source code or from the compiled program—if they are available.Here it is assumed that none are available to the attacker. Instead of identifyingthe instruction, spa is used to reveal where the first S-box lookup occur (i.e. wherez0 is calculated). This interval of time is referred to as the target instruction and isadditionally used in the snr calculations and in the template building. This alsomeans that controlling the target instruction’s input data is equivalent to choos-ing the plaintexts so that z0 has the desired properties. In a set of uniformlydistributed plaintexts the different Hamming weights are binomially distributed.That is, if k random plaintexts are encrypted then approximately k

256(8h

)power

traces correspond to Hamming weight h (if k is large). Therefore, instead of com-pletely randomizing the plaintexts they are chosen so that there are 1000 powertraces corresponding to each Hamming weight. Example 7.1 illustrates how thefirst 1000 plaintexts are chosen. For most Hamming weights there are multiplechoices for p0. In this case the different choices are selected at random until 1000plaintexts are built. The other plaintext bytes does not affect the templates forHW(z0) and can be set to anything2.

2In a parallel implementation this is not true as multiple zi are manipulated at once. To minimizethe effect from other bytes the rest of the plaintext should be random.


Example 7.1There is only one plaintext that results in HW(z0) = 0. It can be found by going

backwards from the definition of z0.

HW(z0) = 0⇔ z0 = 0

⇔ S(p0 ⊕ k0) = 0

⇔ p0 = S−1(0)⊕ k0

7.4.3 Signal-to-Noise Ratio

The snr is calculated according to section 4.6 based on the following measure-ments of the target instruction:

• 1000 power traces where the data is constant.

• 1000 power traces where the lsb of the data is zero.

• 1000 power traces where the lsb of the data is one.

• 1000 power traces for each Hamming weight.

The snr is evaluated for every point in the power traces as the exact time of leak-age is unknown and the largest value is identified.


Most attacks are against the intermediate value zi = S(pi ⊕ ki) for all i ∈ [0,15].The correlation coefficients are evaluated for the lsb, Hamming weight and Ham-ming distance models while the difference of means and distance of means areonly evaluated with the lsb model. The Hamming distance model requires twointermediate values: the input and output of the S-box, i.e. the power consump-tion is modelled as HD(pi ⊕ki ,S(pi ⊕ki)). The number of power traces required tobreak the entire key are determined for all attack combinations. The statementsmade regarding the choice of intermediate value in section 5.3.1 are verified byexecuting an attack with the input to SubBytes as the intermediate value.

7.5.1 Attack on Shuffling

The shuffled implementation is attacked in the same way as the naive implemen-tation with the exception that only the best attack method is used.

7.5.2 Second-Order Attack

The actual trace capture for the second-order attack does not differ much fromthe first-order case, the important thing is that SubBytes is captured. This attackfocuses only on the first two sub-keys due to long execution times of the analysissoftware. Again, spa proves helpful to find a minimal interval covering both z0

62 7 Method

and z1. The second version of the masked implementation (where ms , ms) isattacked.

7.6 Template Attack

The template attack is somewhat different from the others as the microcontrollermust be profiled before the actual attack can take place. As described in sec-tion 5.4.2 the “operations” are chosen as the instruction where zi is manipulatedfor the nine different Hamming weights i.e. HW(zi). During profiling, it doesnot matter what the key is as long as it is known. In this case, it is chosen ran-domly. Furthermore, only z0 is attacked since building templates is somewhatcumbersome. The Hamming weight power traces from section 7.4.3 are reused.

7.6.1 Points of Interest

The points of interest are chosen close to where the maximum snr is found. Insection 5.4.4 an alternative method to select points of interest is presented and isalso tested. In total, eight adjacent points are chosen.

8Results

In this chapter the results from the attacks and investigations are presented.

8.1 AES Implementations

Table 8.1 lists the performance figures for the different implementations. Thecolumn “initialization” refers to the permutation of the index array in the shuf-fled case and the generation of masks (including the new S-box) in the maskedcase. The figures for the masked implementation correspond to the version wheredifferent masks are used before and after the S-box. In summary, shuffling andmasking increase the execution time with 64% and 59%, respectively.

Table 8.1: Performance figures for the three different aes implementations.

Implementation Encryption (ms) Key schedule (ms) Initialization (ms)

Naive 1.264 0.352 0Shuffled 1.728 0.352 0.568Masked 1.456 0.480 0.640


A full naive encryption sampled at 500 kHz is shown in figure 8.1. The time axisis adjusted so that zero matches the beginning of the encryption. To fully capturethe first round with as high sampling frequency as possible the oscilloscope is setto sample at 100 MHz with a memory depth of 16 KiB. A power trace therefore

63

64 8 Results

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5

Time (ms)

Voltage

Figure 8.1: A full (naive) aes-128 encryption. Encryption starts at time zero.

covers approximately 0.16 ms. Figure 8.2 shows the first SubBytes transformationcaptured with the final oscilloscope settings. 16 distinct peaks are visible, eachcorresponding to one S-box lookup. Note that in the second picture time zerodoes not indicate the start of encryption.

0 5 10 15 20 25 30

Time (µs)

Voltage

Figure 8.2: First SubBytes transformation.

8.3 Device Characterisation 65

8.3 Device Characterisation

Device characterisation is performed using the oscilloscope settings decided fromthe spa attack.

8.3.1 Measurement Configuration

The best measurement configuration is high-side sensing using the standard probebundled with the oscilloscope. Using the same probe to measure the voltageover ground (low-side sensing) results in power traces with very small fluctua-tions (compared to the high-side configuration) and the oscilloscope’s resolutionis insufficient to adequately capture these. High-side sensing with the sma cablecauses similar difficulties, but this time due to limitations in the adjustment ofthe voltage offset, as the oscilloscope clips the signal. Table 8.2 summarizes theresult of executing a dpa attack on the first sub-key with 5000 power traces us-ing correlation coefficients together with the Hamming weight model. No power

Table 8.2: Attack results for the various measurement configurations.

Configuration ρmax

High-side sensing + standard probe 0.91Low-side sensing + standard probe 0.38High-side sensing + sma cable -Low-side sensing + sma cable 0.21H-probe at upper half 0.087H-probe at lower half -

traces are recorded for high-side sensing with the sma cable due to oscilloscopelimitations. In the case of the H-probe placed at the lower half of the microcon-troller the attack fails to break the sub-key. The measurement setup is capable ofcapturing 3.3 power traces per second, which means that it takes about 25 min-utes to capture 5000 power traces. Evaluating the correlation coefficients for allkey guesses of one key byte takes about 16.7 s for all 5000 power traces or 3.3 msper trace.

8.3.2 Viability of Hamming Weight Model

Figure 8.3 shows the mean of the power consumption during the first S-boxlookup for the nine different Hamming weights. Note that the dc offset is re-moved. A lower voltage value is indicative of a higher power consumption sincehigh-side sensing is used.

Two t-tests are performed. One with 30 samples and the other with 200 samples.The test statistics are 1.71 and 6.98, respectively. An excerpt of a t-distributiontable with critical values is presented in table 8.3. With these results the nullhypothesis is rejected with an error probability of 10 % based on 30 samples while

66 8 Results

0.0 0.1 0.2 0.3 0.4

Time (µs)

−120

−100

−80

−60

−40

−20

0

20

40

60Voltage

(mV)

HW 0

HW 1

HW 2

HW 3

HW 4

HW 5

HW 6

HW 7

HW 8

Figure 8.3: Power consumption during the first S-box lookup for all Ham-ming weights. The vertical dashed line indicates the point with the highestsnr.

with 200 power traces it is rejected with a very low error probability (less than0.1 %).

Table 8.3: Critical values of a two-tailed t-test based on significance level αand degrees of freedom df .

α

df 0.10 0.05 0.01

28 1.70 2.05 2.76198 1.65 1.97 2.60

8.3.3 Signal-to-Noise Ratio

Estimations of the snr based on measurements of the first S-box are presented intable 8.4. The maximum snr is found at 0.44 µs (the dashed line in figure 8.3). Ad-ditionally, the estimated number of power traces to break the key for correlationcoefficient based attacks are calculated using equation (5.9) with a confidencelevel of 99%, i.e. λα = 2.326.


Table 8.4: Signal-to-noise ratio for the least significant bit and Hammingweight power models.

Power model SNR ρmax Est. number of traces

Least significant bit 0.1350 0.345 87Hamming weight 34.25 0.986 5


The number of required traces to break the naive implementation with five dif-ferent attacks are listed in table 8.5. The results are averages rounded up to thenearest integer between two different sets of power traces using different keys.

Table 8.5: Number of required traces to break an entire key.

Power model Statistic Number of traces

Hamming weight Correlation coefficient 24Hamming distance Correlation coefficient UnsuccessfulLeast significant bit Correlation coefficient 247Least significant bit Difference of means 968Least significant bit Distance of means 247

Figure 8.4 illustrates how the correlation coefficient for the correct key guesschanges in relation to the number of power traces in an attack based on the Ham-ming weight model. In contrast, figure 8.5 shows what happens when the keyguess is wrong.

0 5 10 15 20 25 30

Time (µs)

−1.0

−0.5

0.0

0.5

1.0

ρ

Traces = 10

0 5 10 15 20 25 30

Time (µs)

Traces = 50

0 5 10 15 20 25 30

Time (µs)

Traces = 200

Figure 8.4: Correlation coefficients for the correct sub-key when the Ham-ming weight model is used.

The difference between choosing the input and the output of SubBytes as the inter-mediate value is illustrated in table 8.6. The table lists the five best key guessesand their correlation coefficients based on an attack using the Hamming weightmodel. The same 5000 power traces are used to calculate the values.

68 8 Results

0 5 10 15 20 25 30

Time (µs)

−1.0

−0.5

0.0

0.5

1.0ρ

Traces = 10

0 5 10 15 20 25 30

Time (µs)

Traces = 50

0 5 10 15 20 25 30

Time (µs)

Traces = 200

Figure 8.5: Correlation coefficients for the wrong sub-key when the Ham-ming weight model is used.

Table 8.6: Top five key guesses in a dpa attack using two different interme-diate values based one 5000 power traces. 125 is the correct value.

z0 = S(p0 ⊕ k0) z0 = p0 ⊕ k0

Key ρ Key ρ

1 125 0.91 125 0.702 136 0.22 130 0.703 96 0.21 128 0.674 49 0.21 127 0.675 255 0.21 129 0.67

8.4.1 Attack on Shuffling

Figure 8.6 shows the results of a dpa attack using 5000 power traces on the shuf-fled implementation. The Hamming weight model is used in conjunction withthe correlation coefficient. Instead of one distinct peak there are multiple peaksspread out over the entire SubBytes transform. The maximum absolute correla-tion coefficient from 5000 power traces is 0.079. At a confidence level of 99% therequired number of power traces can be estimated to 1730. The actual numberof power traces required to break the key, based on one set, is 4998. The theoreti-cal correlation coefficient is one sixteenth of the naive implementation’s ρmax, i.e.approximately 0.057. This is 28 % smaller than the actual value and correspondsto 3326 estimated power traces.

8.4.2 Second-Order Attack

A first-order attack against the masked implementation based on 5000 powertraces fails to recover the key. To reduce attack complexity the second-order at-tack is executed with only 2000 power traces. The absolute difference functionis used to build the combined power traces. The selected interval covers the firsttwo S-box lookups and contains 200 sample points. Thus the combined powertraces are 19900 points long. The result of the attack is shown in figure 8.7. Notethat this figure is zoomed in and that the x-axis does not indicate time due to


0 5 10 15 20 25 30

Time (µs)

−0.10

−0.05

0.00

0.05

0.10

ρ

Figure 8.6: Correlation coefficients for the correct sub-key in the shuffledimplementation using 5000 power traces.

the combining function. Each sample point depends on measurements from twodifferent points in time. The maximum absolute correlation coefficient is 0.62,corresponding to an estimation of 24 required power traces at a confidence levelof 99%. Evaluating all 65536 possible combinations takes about five hours on thecurrent setup.

11000 11500 12000 12500 13000 13500 14000

Sample

−1.0

−0.5

0.0

0.5

1.0

ρ

Correct key guess

Wrong key guess

Figure 8.7: Correlation coefficients for the first two sub-keys in the maskedimplementation using 2000 power traces.

70 8 Results

8.5 Template Attack

With the time axis in figure 8.3 as reference, the eight points of interest rangefrom 0.40 µs to 0.48 µs. Figure 8.8 shows the result of calculating the differencebetween each pair of mean traces as described in section 5.4.4. One of the peaksoccur at the point of maximum snr, but it is not the highest. Figure 8.9 showshow the probability evolves for each key guess in relation to the number of traces.The number of points was selected through trial and error. The evaluation timefor all key guesses is 124 ms per trace.

8.5 Template Attack 71

0.0 0.1 0.2 0.3 0.4

Time (µs)

−0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8Su

mof

pairw

isedifferen

ces(V

)

Figure 8.8: The sum of pairwise differences between the means of the powertraces for each Hamming weight during the first S-box lookup.

1 2 3 4 5 6 7 8 9 10

Number of traces

0.0

0.2

0.4

0.6

0.8

1.0

Proba

bility

Figure 8.9: The probability of each key guess in relation to the number oftraces. The blue line indicates the correct key guess.

9Discussion

In this chapter the results are discussed in relation to the chosen method. Choicesmade are argued for and against and suggestions for improvements are presented.Three aspects of the method are further discussed in the last section: the chosenmeasurement setup, aes modes of operation and the trigger signal.

9.1 Results

Overall, the results from both attacks and countermeasures are in line with whatcould be expected. The spa trace reveals nothing in the form of key material butis a great tool to identify patterns and without it the dpa attacks become harderdue to longer power traces.

9.1.1 Practical Issues

Some issues with the measurement setup should be brought up. It is difficult touse the exact same oscilloscope settings when capturing multiple sets of tracesdue to variations in the measured voltage. A voltage scale that gives great resultsone day may have to be adjusted the next. Additionally, during long runs thesignal sometimes drifts and in the worst case it goes outside the oscilloscope’srange. One possible explanation is that perhaps the power supply is not as stableas it should be and fails to keep the voltage at 3.3 V. Another likely explanation isthat both the oscilloscope and the power supply are temperature dependent andshould be powered on long before capturing the power traces.

In general, these issues does not affect the dpa attacks as the absolute value ofthe voltage is not of interest and any one set of power traces are independentof other sets. But it does cause problems in template attacks as in this case it is

73

74 9 Discussion

important that the measurements used to build templates and the measurementsused to attack the microcontroller are on the same scale and has the same offset.Similarly, it may affect the snr results as these depend on the current settings ofthe oscilloscope. The results presented in this thesis are all acquired using thesame voltage scale on the oscilloscope.

9.1.2 Device Characterisation

Using a good measurement configuration is evidently very important with re-gards to power analysis and the results raise some interesting questions. Whydoes high-side sensing produce much better results than low-side sensing? Thisis likely due to differences in routing of ground wires and supply wires in the pcb

causing the voltage level to fluctuate a lot more when measured at the microcon-troller’s Vdd pin. The oscilloscope’s resolution is simply not high enough to detectthe differences in power consumption at the ground pin using the standard probe.On the other hand, the sma cable measurements contain huge fluctuations, butas the results show these power traces contain a lot of noise. The H-probe powertraces has the worst quality, but this is expected. The probe’s size makes it hardto pinpoint the physical location of the leakage signal and the microcontroller’spackage shields large parts of the electromagnetic emanations.

Figure 8.3 shows that the microcontroller leaks the Hamming weight of the datait manipulates. It also indicates that data with large Hamming weights consumemore power than data with smaller Hamming weights. An interesting result isthat between 0.1 µs and 0.2 µs the mean voltage of data with a Hamming weightof seven is lower than that of data with a Hamming weight of eight. From 0.4 µsand onwards the graph is more in line with what is expected. This is reflectedin the snr calculations where the maximum is located at 0.44 µs. The hypothe-sis test shows that it is possible to detect a difference between two values withthe same Hamming weight, but this requires a large number of power traces pervalue. This, however, illustrates some of the idealizations in the Hamming weightmodel. It assumes that all bits contribute equally to the power consumption, butin reality this is not the case. Due to crosstalk between wires and parasitic capac-itances in the microcontroller charging two different wires consumes differentamounts of power. With the knowledge that different values can be separatedfrom each other it might be tempting to use the intermediate value itself as thepower model, e.g. zi = 67 would have a hypothetical power consumption of 67.This will not work particularly well as this model assumes, for example, that thevalues 127 and 128 consume similar amounts of power. In fact, 128 with a Ham-ming weight of one consumes significantly less power than 127 that has a Ham-ming weight of seven. Theoretically, it might be possible to improve the powermodel of the microcontroller by determining how much each bit contributes tothe power consumption and weigh them accordingly but this requires profiling,at which point opting for templates might be a better choice.

9.1 Results 75

9.1.3 Attacks

Among the standard dpa attacks the correlation coefficient attack in conjunctionwith the Hamming weight model is the best choice. Given that it uses the mostdetailed power model this is not surprising. A more interesting result is thatthe distance of means method requires the same number of power traces as thecorrelation coefficient attack given the same power model, indicating that the cor-relation coefficient and the distance of means measure similar properties. Thatthe Hamming distance model fails makes sense if the implementation of the mi-crocontroller is considered. Recall that for the Hamming weight model to workthe bus should be pre-charged to either all ones or all zeros. If this is the casethen the Hamming distance between the input and output of SubBytes will haveno correlation. A better use of the Hamming distance model would have been todetermine which value the bus pre-charges to, but is not necessary in this case.This result is a good indication of what happens when a sound power model isapplied blindly without regard of the device’s implementation.

The potency of the template attack is illustrated in figure 8.9 where the proba-bility for the correct sub-key quickly approaches one, while the others approachzero after only six power traces. As long as templates of the same quality are con-structed for the other sub-keys this is the attack that requires the fewest numberof traces. The hardest part of template construction is finding a small set of goodpoints of interest—too many and the attack will fail due to numerical issues. Theexecution time of the attack per trace is a lot slower compared to a dpa attackbut this is of less concern due to the small number of traces required. While theprofiling is cumbersome it is not that much of a hindrance given unlimited accessto a dummy device. That being said, as long as the number of encryptions usingthe same key is not severely limited a regular correlation coefficient attack is bothfaster and easier.

Table 8.6 lists the best key guesses from two dpa attacks using different interme-diate values. These results highlight the importance of choosing a good inter-mediate value. Not only is the maximum correlation lower when targeting theinput to SubBytes but the correlation coefficient of the next best key guesses aremuch closer to the best key guess. In fact, the key guesses 125 and 130 both pro-duce the same absolute correlation. Consider the values’ binary representations:125 = 011111012 and 130 = 100000102. Evidently, 130 is the binary comple-ment of 125. Since xor is a linear operation any linear relationship between theoperands is preserved. Every key guess with a linear relation to the correct keywill yield a hypothetical power consumption vector correlated with the actualpower consumption. This also means that there will be at least two key guesses(the correct key and its complement) with exactly the same absolute correlationcoefficient no matter how many power traces are captured! In contrast, the S-boxis specifically designed to be non-linear and will destroy any correlation. Ironi-cally, the strength of the S-box actually helps in differential power analysis.

76 9 Discussion

9.1.4 Countermeasures

The countermeasures shown in this thesis are somewhat expensive to implementwith respect to the extra security provided, especially in the case of shuffling werethe adversary does not even have to be aware that it is implemented and can sim-ply record more power traces. There are two reasons why shuffling is slow. First,permuting the index array requires generating fifteen random numbers within anon-constant range and requires the use of the modulo operation. This impliesinteger division, which is slow on the avr microcontroller. Second, indirect in-dexing is slow as first the index has to be looked up before the state byte canbe read from memory. First-order masking is also broken with relatively littleextra effort. Identifying SubBytes is straightforward and from there finding thefirst two S-box lookups is trivial. However, when the input and output masks ofSubBytes are different the attack takes considerably longer to execute. The timegoes from a few seconds per sub-key in the naive case to about five hours per twosub-keys in the masked case. Additionally, the attacker must be aware that mask-ing is employed. One thing of note is that the maximum correlation coefficientis significantly reduced in shuffling and somewhat reduced in masking. A combi-nation of shuffling and first-order masking would further increase the number ofrequired power traces but would otherwise not provide any additional security—the second-order attack would still be executed in the same way.

9.1.5 Number of Traces

The estimated maximum correlation coefficient ρmax based on the snr is 0.986,which is somewhat higher than the actual ρmax at 0.91. This is best explained bythe choice of setting ρH,PS to 1, which is too optimistic from an attacker’s point ofview. However, this does not impact the estimated number of required traces bythat much (from five to seven in the Hamming weight case). Using an lsb powermodel the estimated number of traces is 87 which can be compared to the 247traces required. All estimations consistently provide numbers smaller than theactual required numbers and the assertion that this is a lower bound holds true.

When determining ρmax for the shuffled implementation 5000 power traces arecaptured but 4998 are required to break the entire key, which means that ρmaxis probably somewhat higher. The correlation coefficient is higher than expectedwhich might indicate flaws in the random number generation. Another factor isthat the calculation of ρmax is based on the first intermediate value z0, while theactual required number of traces is determined by breaking the entire key. How-ever, this should not matter that much since the snr should be equal for all 16S-box lookups due to the microcontroller executing the exact same instructions.

9.2 Method

The measurement configuration can be considered fairly realistic assuming the at-tacker has access to the device. Should this not be the case a completely differentproblem arises: how can the user be fooled into using the device in a way that the

9.3 Power Analysis in a Broader Context 77

attacker can record the power consumption? In the case of credit cards a strongadversary might be able to tamper with the terminal, e.g. adding current sensingcircuits to an atm. A bigger concern is that measuring the electromagnetic radi-ation worked and with better equipment the distance from the microcontrollermay be increased.

The second environmental decision lies in the encryption behaviour. The micro-controller accepts one block of plaintext data, encrypts it and responds with theciphertext. For plaintexts of arbitrary size this is not particularly realistic andtypically some mode of operation would be implemented on top of the aes prim-itive. It is intuitive that ecb would not provide any extra security as it is equiva-lent to the current setup, but other modes with initialization vectors are probablytougher to attack. However, many applications do not need modes of operationsince they only encrypt one block of data. An example could be a challenge-response protocol. The challenge could be a random integer n encrypted withthe secret key. To prove that the target knows the secret key it decrypts the chal-lenge and responds with the encryption of n+1. Power analysis could be executedeither during decryption or encryption, assuming that either the challenge or theresponse can be acquired. n does not have to be known.

The artificial trigger signal is a somewhat unrealistic choice with the result thatall power traces are automatically aligned. In reality, the power traces wouldbe slightly offset from each other which affects the snr negatively. Other ways ofdetermining when encryption starts are monitoring serial communications or thepower consumption. Considering figure 8.1 the difference in power consumptionbetween idle mode and encryption mode is evident.

The references have been chosen with care and many of the authors have pub-lished a lot of articles on the topic. The book Power Analysis Attacks: Revealingthe Secrets of Smart Cards by Mangard et al. in turn references many of the cho-sen articles, which should further establish their reliability. In general, articlespublished in renowned journals were preferred.

9.3 Power Analysis in a Broader Context

Cryptography is a field of study that is highly intertwined with ethics and it raisesmany interesting questions. Are secure communications and data storage humanrights or should they be reserved solely for military and business use? The ethicsof attacking and attempting to find weaknesses in existing solutions is also inter-esting. For instance, how should vulnerabilities be disclosed? Should the discov-ery be made public as soon as possible to let people know or should it be keptsecret until the security hole has been plugged? Another interesting and todayhighly relevant issue is the one regarding the insertion of backdoors into crypto-graphic systems so that, for example, law enforcement can bypass the encryption.A few examples as to why this is a bad idea include:

• Trust: Can we really trust that everyone who knows about the backdoor

78 9 Discussion

will keep it secret and won’t use it for personal gain?

• Exploitable: Weakening encryption simplifies the problem for attackers.They could get lucky or perhaps they are clever enough to detect the back-door.

While interesting and important, further discussion regarding these topics areout of the scope of this thesis. Power analysis has been publicly known for yearsand, additionally, no existing cryptographic system is attacked but rather a cus-tom implementation designed to approximate a real system.

10Conclusions

In this thesis the potency and practicality of power analysis against software im-plementations of aes on 8-bit microcontrollers has been investigated. Below, thequestions in the problem formulation presented in chapter 1 are answered.

1. What is power analysis and how can it be used to retrieve the aes encryptionkey from an 8-bit software implementation?

(a) What makes aes sensitive to power analysis?

• Theoretically, any algorithm implemented in cmos technology issensitive to power analysis given equipment capable of detectingthe leakage and enough measurements. This is due to the fact thatthe power consumption of cmos devices is data dependent.

• The reason why aes is particularly sensitive is mainly due to theinitial key whitening and that all state bytes are treated indepen-dently in SubBytes, which enables an attacker to focus on one keybyte at a time rather than the entire key as in a brute force attack.

• An 8-bit microcontroller can only manipulate one byte at a timewhich means that the power consumption is solely depending onthe targeted intermediate value and electrical noise is the onlynoise source. In parallel implementations where multiple bytesare evaluated simultaneously the noise is considerably higher.

(b) What different methods of power analysis exist and how can they becompared?

• Power analysis attacks range from simple attacks based on visualinspection of power traces to advanced profiling attacks. Sim-

79

80 10 Conclusions

ple power analysis can aid an attacker in identifying the crypto-graphic algorithm and where the sensitive leakage occur. Differ-ential power analysis attacks the secret key using simple powermodels based on generic assumptions about the target’s power con-sumption. Template attacks carefully profile the microcontrollerto match the captured power traces with the most likely key.

• Numerous figures of merit can be used to compare different at-tacks. For example computational complexity, required knowl-edge of the target system and preparation time. This thesis usesthe number of traces required to break the entire encryption keytogether with the preparation time as the main indicators. Thebest attack against the aes implementation in this thesis is differ-ential power analysis using the Hamming weight model. Templateattacks are a close second requiring fewer power traces but consid-erable more time in preparation. In reality, the choice of attackwould depend entirely on the attacker’s situation.

2. How can power analysis of aes be prevented?

(a) Are there ways to make power analysis harder by modifying the soft-ware?

• There are two forms of countermeasures presented in this thesis:hiding and masking. Hiding is based on making it harder to de-termine when the leakage happens, forcing an attacker to study awide range of power consumption measurements with lower snr

as the result. Conversely, masking does not try to remove the leak-age but rather make it uncorrelated to the sensitive intermediatevalues. Both countermeasures rely heavily on good random num-ber generation. In this thesis hiding is implemented through shuf-fling the state bytes of aes and a first-order masking scheme ispresented.

• First-order masking effectively prevents first-order differential poweranalysis while shuffling increases the required number of powertraces by a large amount. Higher-order masking is required tosufficiently protect against second-order power analysis, howeverthis has not been tested in this thesis. The complexity of higher-order attacks increases exponentially.

(b) What is the performance cost of these countermeasures?

• Shuffling and masking are both expensive and reduces algorithmperformance. Furthermore, the random number requirements addcomplexity in the form of entropy extraction and pseudo-randomnumber generation.

• The aes implementation is written in C and there is room for op-timizing both the original algorithm and the countermeasures by

10.1 Further Research 81

writing some parts in assembly language and improve register us-age. This has, however, been left largely unexplored.

In conclusion, power analysis is powerful due to a disconnect between the crypto-graphic algorithm (aes) and its physical implementation. aes seems to have beenconceived with little concern for side-channel analysis and the responsibility ofprotecting it is deferred to the implementers. This has given rise to a numberof ad hoc countermeasures. Ideally, the block cipher should be constructed withprotection against side-channel analysis in mind.

10.1 Further Research

The field of side-channel analysis is wide and there are a number of ways toimprove upon the attacks and countermeasures presented in this thesis. A list oftopics to research further is presented below.

• Trace alignment and clock syncing: This thesis deals with aligned powertraces and studying how much worse the situation becomes when that isnot the case is important. Template attacks in particular are sensitive to thisas the points of interest must be the same in the templates and the powertraces. A related improvement to the attack environment would be to addclock syncing; by syncing the sample clock with the microcontroller’s clockthe quality of the power traces can be increased.

• Mode of Operation: Using a mode of operation with an initialization vectordoes not prevent power analysis (the leakage is still there) but it may com-plicate matters. For instance, how is the attack procedure in a dpa attackmodified so that intermediate values are dependent on the same key byteover multiple block encryptions? Another interesting problem is how to de-tect which mode of operation the target is employing. Can this be detectedusing spa?

• Combined and higher-order countermeasures: Combining the masking schemeand shuffling implementation presented here would not increase security inthe sense that the (second-order) attack changes but it would increase thenumber of power traces. Additionally, higher-order masking schemes areof interest but the performance hit may be too large to be of any practicaluse.

• Attacking parallel implementations: Theoretically, attacking microcontrollerswith wider words, e.g. 32- and 64-bits, is not any different from the 8-bitcase. However, the noise is significantly higher and these processors gener-ally run at higher frequencies. It would be interesting to evaluate the limitsof the presented attack setup.

Appendix

AMathematical Prerequisites

This chapter provides an overview of basic statistics and presents some formulasand calculations that did not fit within the main thesis.

A.1 Statistics

It is assumed that the reader is familiar with basic statistics as most formulasare simply presented without much elaboration. Let X denote a random variable.A random variable is a function that maps values in a sample space Ω to realvalues, i.e. X : Ω→R. The expected value of X, also referred to as the mean of X,is written as

µX = E(X).

The variance of a random variable is a measurement of how widely spread theoutcomes of X are and is defined as

σ2X = V (X) = E

[(X −µX )2

].

σ is known as the standard deviation and is equivalent to the square root of thevariance. A related metric used to determine linear relationships between tworandom variables X and Y is the covariance, defined in equation (A.1).

Cov(X,Y ) = E [(X −µX )(Y −µY )] = E(XY )−µXµY (A.1)

As a special case, the covariance of a random variable with itself is equal to itsvariance: Cov(X,X) = σ2

X . A positive covariance indicates that the two variablestend to change together in the same direction, while a negative covariance im-plies that they change in different directions. The larger the absolute value of thecovariance, the stronger the linear relationship between X and Y . Alternatively,

85

86 A Mathematical Prerequisites

the Pearson correlation coefficient, presented in equation (A.2), can be used toquantify correlation.

ρX,Y =Cov(X,Y )√σ2Xσ

2Y

(A.2)

The correlation coefficient is dimensionless and limited to values in the range [−1,1].If the covariance is zero, the correlation coefficient is zero and the random vari-ables X and Y are said to be uncorrelated. This means that independent randomvariables are uncorrelated. However, the opposite is not necessarily true as corre-lation is only a measurement of linear dependencies.

A.1.1 Parameter Estimation

In general it is not possible to determine parameters such as mean and vari-ance exactly and they have to be estimated. Equations (A.3) through (A.6) showhow the sample mean, variance, covariance and correlation coefficient is calcu-lated, respectively, given samples from X and Y such that x = [x1 x2 ... xn ] andy = [ y1 y2 ... yn ]. It is common to use different letters to clearly differentiate themas estimations.

x =1n

n∑i=1

xi (A.3)

s2x =1

n− 1

n∑i=1

(xi − x)2 (A.4)

c =1

n− 1

n∑i=1

(xi − x)(yi − y) (A.5)

r =c√s2xs

2y

=∑ni=1(xi − x)(yi − y)√∑n

i=1(xi − x)2√∑n

i=1(yi − y)2(A.6)

A.1.2 Differentiating Two Distributions

A common question in statistics is whether two distributions differ in any signif-icant way. A follow-up question is: how many samples are needed to, with somecertainty, claim that the two distributions differ? In the context of power analy-sis, the last question is of considerable interest. Assume two independent normaldistributions X ∼ N (µx,σx) and Y ∼ N (µy ,σy). The probability that a randomsample from X is larger than one from Y is

Pr(X > Y ) = Pr(X −Y > 0) = 1−Pr(X −Y < 0).

Let Z = X−Y . Independence gives µZ = µX −µY and σ2Z = σ2

X +σ2Y .The probability

can now be calculated as in equation (A.7) where Φ is the cumulative distribution

A.1 Statistics 87

function of the standard normal distribution.

Pr(X > Y ) = 1−Pr(Z < 0) = 1−Pr(Z −µZσZ

< −µZσZ

)= 1−Φ

(−µZσZ

)= Φ

(µZσZ

)

= Φ

µX −µY√σ2X + σ2

Y

(A.7)

To answer the question of how many samples n are required to say with a con-fidence of 1 − α that X and Y are different equation (A.7) can be rewritten asequation (A.8). λα is the value that satisfies Pr(N > λα) = 1−α where N ∼N (0,1).

Φ

µX −µY√σ2X + σ2

Y

= 1−α⇔ µX −µY√σ2X + σ2

Y

= λα (A.8)

In general, for sampling distributions the variances depend on the number ofsamples nwhich makes it possible to solve equation (A.8) for n given a confidencelevel 1−α.

Hypothesis Test

Consider the two independent sampling distributions X and Y with an equalnumber of observations x = [x1 x2 ... xn ] and y = [ y1 y2 ... yn ]. To determine whetherthere is any difference between the means of X and Y the two-tailed t-test is used.Assume that X and Y have equal variances, estimated as s2X and s2Y , and withsample means x and y. The null hypothesis is that µX = µY and the alternativehypothesis is that µX , µY . The test statistic t∗ is presented in (A.9).

t∗ =x − y√s2Xn

+s2Yn

(A.9)

t∗ is an observation from a t-distribution with 2n−2 degrees of freedom, denotedas df . The null hypothesis is rejected if |t∗| is greater than some critical valuetα/2(df ) where α is the significance level, i.e. the probability that the null hypoth-esis is rejected even if it is true. tα/2(df ) is generally acquired from a t-distributiontable.

A.1.3 Fisher z-transformation

Sometimes it is of interest to perform hypothesis tests on the correlation coeffi-cient. By mapping r to a random variable Z ∼ N (µZ ,σZ ) the sampling distribu-tion of the correlation coefficient can be studied. This transformation is definedin equation (A.10).

Z =12

ln(1 + r

1− r). (A.10)

88 A Mathematical Prerequisites

The mean and variance of Z is presented in equations (A.11) and (A.12), respec-tively, where n is the number of samples.

µZ =12

ln(

1 + ρ1− ρ

)(A.11)

σ2Z =

1n− 3

(A.12)

A.1.4 Multivariate Normal Distribution

The normal distribution can be generalized to higher dimensions. This is called amultivariate normal distribution. Let X be an N -dimensional random vector, i.e.every element is a random variable so that X = [X1 X2 ... XN ]. This can be writtenas

X ∼N (µ,Σ)

where µ is the mean vector and Σ is the covariance matrix. The probability den-sity function of X is presented in (A.13).

fX(x) =1√

(2π)N |Σ|exp

(−1

2(x−µ)TΣ−1(x−µ)

)(A.13)

The mean vector is defined as µ = [µX1 µX2 ... µXN ] and the covariance matrix is de-fined such that Σi,j = Cov(Xi ,Xj ). The covariance matrix contains N ×N elementsand is symmetric along the diagonal. The diagonal contains the variances of therandom variables Xi .

Bibliography

[1] Wade Trappe and Lawrence C. Washington. Introduction to Cryptographywith Coding Theory. Pearson, 2 edition, 2006. Cited on pages 6, 7, 8, 9,and 14.

[2] Simon Singh. The Code Book: the Science of Secrecy from Ancient Egypt toQuantum Cryptography. Anchor, 2011. Cited on page 7.

[3] Hans Delfs and Helmut Knebl. Introduction to Cryptography: Principlesand Applications. Springer Berlin Heidelberg, 2007. Cited on pages 9, 10,and 14.

[4] Auguste Kerckhoffs. La cryptographie militaire. Journal des Sciences Mili-taires, 1883. Cited on page 10.

[5] Paul Kocher. Timing attacks on implementations of Diffie-Hellman, RSA,DSS and other systems. In Neal Koblitz, editor, Advances in Cryptology- CRYPTO’ 96, volume 1109 of Lecture Notes in Computer Science, pages104–113. Springer, 1996. Cited on page 10.

[6] Paul Kocher, Joshua Jaffe, and Benjamin Jun. Differential power analysis.In Michael Wiener, editor, Advances in Cryptology - CRYPTO’ 99, volume1666 of Lecture Notes in Computer Science, pages 388–397. Springer BerlinHeidelberg, 1999. Cited on pages 10, 31, and 36.

[7] Daniel Genkin, Adi Shamir, and Eran Tromer. RSA key extraction vialow-bandwidth acoustic cryptanalysis. Cryptology ePrint Archive, Report2013/857, 2013. http://eprint.iacr.org/. Cited on page 10.

[8] Hagai Bar-El, Hamid Choukri, David Naccache, Michael Tunstall, andClaire Whelan. The sorcerer’s apprentice guide to fault attacks. Proceed-ings of the IEEE, 94(2):370–382, February 2006. Cited on page 11.

[9] National Institute of Standards and Technology. FIPS 197, Advanced En-cryption Standard, November 2001. Cited on page 15.

[10] Jan M. Rabaey, Anantha Chandrakasan, and Borivoje Nikolic. Digital Inte-grated Circuits. Pearson, 2 edition, 2003. Cited on page 21.

89

http://eprint.iacr.org/

90 Bibliography

[11] David K. Cheng. Field and Wave Electromagnetics. Addison-Wesley, 2 edi-tion, 1989. Cited on page 25.

[12] Eric Peeters, François-Xavier Standaert, and Jean-Jacques Quisquater. Powerand electromagnetic analysis: Improved model, consequences and compar-isons. Integration, the VLSI Journal, 40(1):52–60, January 2007. Cited onpage 25.

[13] Stefan Mangard, Elisabeth Oswald, and Thomas Popp. Power Analysis At-tacks: Revealing the Secrets of Smart Cards. Springer Science, 2007. Citedon pages 25, 37, 45, 46, 53, and 77.

[14] Thomas S. Messerges, Ezzat A. Dabbish, and Robert H. Sloan. Power anal-ysis attacks of modular exponentiation in smartcards. In Çetin K. Koç andChristof Paar, editors, Cryptographic Hardware and Embedded Systems -CHES’99, volume 1717 of Lecture Notes in Computer Science, pages 144–157. Springer Berlin Heidelberg, 1999. Cited on page 32.

[15] Thomas S. Messerges, Ezzat A. Dabbish, and Robert H. Sloan. Examiningsmart-card security under the threat of power analysis attacks. IEEE Trans-actions on Computers, 51(5):541–552, May 2002. Cited on page 33.

[16] Eric Brier, Cristophe Clavier, and Francis Olivier. Correlation power anal-ysis with a leakage model. In Marc Joye and Jean-Jacques Quisquater, ed-itors, Cryptographic Hardware and Embedded Systems - CHES 2004, vol-ume 3156 of Lecture Notes in Computer Science, pages 16–29. SpringerBerlin Heidelberg, 2004. Cited on page 37.

[17] Stefan Mangard. Hardware countermeasures against DPA—a statisticalanalysis of their effectiveness. In Tatsuaki Okamoto, editor, Topics in Cryp-tology - CT-RSA 2004, volume 2964 of Lecture Notes in Computer Science,pages 222–235. Springer Berlin Heidelberg, 2004. Cited on pages 38 and 39.

[18] Colin O’Flynn and Zhizhang Chen. Side channel analysis of an AES-256bootloader. Cryptology ePrint Archive, Report 2014/899, 2014. http://eprint.iacr.org/. Cited on page 40.

[19] Suresh Chari, Josyula R. Rao, and Pankaj Rohatgi. Template attacks. InBurton S. Kaliski, Çetin K. Koç, and Christof Paar, editors, CryptographicHardware and Embedded Systems - CHES 2002, volume 2523 of LectureNotes in Computer Science, pages 13–28. Springer Berlin Heidelberg, 2003.Cited on pages 40, 41, and 43.

[20] Elisabeth Oswald and Stefan Mangard. Template attacks on masking—resistance is futile. In Masayuki Abe, editor, Topics in Cryptology - CT-RSA2007, volume 4377 of Lecture Notes in Computer Science, pages 243–256.Springer Berlin Heidelberg, 2007. Cited on pages 41, 42, and 43.

[21] Christian Rechberger and Elisabeth Oswald. Practical template attacks.In Chae Hoon Lim and Moti Yung, editors, Information Security Applica-



Bibliography 91

tions, volume 3325 of Lecture Notes in Computer Science, pages 440–456.Springer Berlin Heidelberg, 2005. Cited on page 43.

[22] Adi Shamir. Protecting smart cards from passive power analysis with de-tached power supplies. In Çetin K. Koç and Christof Paar, editors, Cryp-tographic Hardware and Embedded Systems — CHES 2000, volume 1965of Lecture Notes in Computer Science, pages 71–77. Springer Berlin Heidel-berg, 2000. Cited on page 46.

[23] Christoph Herbst, Elisabeth Oswald, and Stefan Mangard. An AES imple-mentation resistant to power analysis attacks. In Jianying Zhou, Moti Yung,and Feng Bao, editors, Applied Cryptography and Network Security, vol-ume 3989 of Lecture Notes in Computer Science, pages 239–252. SpringerBerlin Heidelberg, 2006. Cited on pages 46, 49, and 50.

[24] Nicolas Veyrat-Charvillon, Marcel Medwed, Stéphanie Kerckhof, andFrançois-Xavier Standaert. Shuffling against side-channel attacks: A com-prehensive study with cautionary note. In Xiaoyn Wang and Kazue Sako,editors, Advances in Cryptology - ASIACRYPT 2012, volume 7658 of Lec-ture Notes in Computer Science, pages 740–757. Springer Berlin Heidelberg,2012. Cited on pages 31 and 46.

[25] Matthieu Rivain, Emmanuel Prouff, and Julien Doget. Higher-order mask-ing and shuffling for software implementations of block ciphers. InCristophe Clavier and Kris Gaj, editors, Cryptographic Hardware and Em-bedded Systems - CHES 2009, volume 5747 of Lecture Notes in ComputerScience, pages 171–188. Springer Berlin Heidelberg, 2009. Cited on pages47 and 51.

[26] Emmanuel Prouff, Matthieu Rivain, and Régis Bévan. Statistical analysis ofsecond order differential power analysis. IEEE Transactions on Computers,58(6):799–811, June 2009. Cited on pages 47 and 52.

[27] Jean-Sébastien Coron and Louis Goubin. On boolean and arithmetic mask-ing against differential power analysis. In Çetin K. Koç and Christof Paar,editors, Cryptographic Hardware and Embedded Systems - CHES 2000, vol-ume 1965 of Lecture Notes in Computer Science, pages 231–237. SpringerBerlin Heidelberg, 2000. Cited on page 47.

[28] Jovan D. Golić and Christophe Tymen. Multiplicative masking and poweranalysis of AES. In Burton S. Kaliski, Çetin K. Koç, and Christof Paar, ed-itors, Cryptographic Hardware and Embedded Systems - CHES 2002, vol-ume 2523 of Lecture Notes in Computer Science, pages 192–212. SpringerBerlin Heidelberg, 2003. Cited on page 48.

[29] Elisabeth Oswald and Kai Schramm. An efficient masking scheme for AESsoftware implementations. In Joo-Seok Song, Taekyoung Kwon, and MotiYung, editors, Information Security Applications, volume 3786 of LectureNotes in Computer Science, pages 292–305. Springer Berlin Heidelberg,2006. Cited on page 49.

92 Bibliography

[30] Stefan Mangard. A simple power-analysis (SPA) attack on implementationsof the AES key expansion. In Pil Joong Lee and Chae Hoon Lim, editors,Information Security and Cryptology - ICISC 2002, volume 2587 of Lec-ture Notes in Computer Science, pages 343–358. Springer Berlin Heidelberg,2003. Cited on page 49.

[31] Suresh Chari, Charanjit S. Jutla, Josyula R. Rao, and Pankaj Rohatgi. To-wards sound approaches to counteract power-analysis attacks. In MichaelWiener, editor, Advances in Cryptology - CRYPTO’ 99, volume 1666 of Lec-ture Notes in Computer Science, pages 398–412. Springer Berlin Heidelberg,1999. Cited on page 51.

[32] Elisabeth Oswald, Stefan Mangard, Cristoph Herbst, and Stefan Tillich.Practical second-order dpa attacks for masked smart card implementationsof block ciphers. In David Pointcheval, editor, Topics in Cryptology - CT-RSA 2006, volume 3860 of Lecture Notes in Computer Science, pages 192–207. Springer Berlin Heidelberg, 2006. Cited on page 52.

[33] Colin O’Flynn and Zhizhang Chen. ChipWhisperer: An open-source plat-form for hardware embedded security research. In Emmanuel Prouff, edi-tor, Constructive Side-Channel Analysis and Secure Design, volume 8622 ofLecture Notes in Computer Science, pages 243–260. Springer InternationalPublishing, 2014. Cited on page 56.

[34] Donald Knuth. The Art of Computer Programming, Volume 2: Seminumer-ical Algorithms. Addison-Wesley, 3 edition, 1998. Cited on page 58.

Upphovsrätt

Detta dokument hålls tillgängligt på Internet — eller dess framtida ersättare —under 25 år från publiceringsdatum under förutsättning att inga extraordinäraomständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för icke-kommersiell forskning och för undervisning. Överföring av upphovsrätten viden senare tidpunkt kan inte upphäva detta tillstånd. All annan användning avdokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns det lösningar av teknisk och administrativart.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsmani den omfattning som god sed kräver vid användning av dokumentet på ovanbeskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådanform eller i sådant sammanhang som är kränkande för upphovsmannens litteräraeller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förla-gets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet — or its possi-ble replacement — for a period of 25 years from the date of publication barringexceptional circumstances.

The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for his/her own use andto use it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are conditional on the consent of the copyright owner. Thepublisher has taken technical and administrative measures to assure authenticity,security and accessibility.

According to intellectual property law the author has the right to be men-tioned when his/her work is accessed as described above and to be protectedagainst infringement.

For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity, pleaserefer to its www home page: http://www.ep.liu.se/

© Mattias Fransson

http://www.ep.liu.se/

http://www.ep.liu.se/

institutionen för systemteknik -...

Documents