algorithm and hardware design of encryption scheme for h ... · the detailed attack procedure on...

145
Algorithm and Hardware Design of Encryption Scheme for H.264/AVC FAN, Yibo Graduate School of Information, Production and Systems Waseda University February 2009

Upload: doankhanh

Post on 01-May-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Algorithm and Hardware Design of Encryption Scheme for H.264/AVC

FAN, Yibo

Graduate School of Information, Production and Systems

Waseda University

February 2009

- i -

Abstract

H.264, which is also known as MPEG-4 part 10 or AVC (for Advanced Video Coding),

is the latest international video coding standard proposed in 2003. Currently, there is few

encryption schemes proposed for H.264/AVC standard, and most of the proposed

schemes are designed for previous video coding standards, such as MPEG-1,

MPEG-2/H.262, MPEG-4 and H.263.

This dissertation presents a new video encryption scheme for H.264/AVC and also the

hardware design of encryption module. The contributions of this dissertation include

three parts: 1) the proposed new video encryption scheme provides higher security with

lower computational cost. 2) The proposed scalable hardware architecture for encryption

module achieves great scalability, which can be widely used in different video systems. 3)

The proposed five DPA attack countermeasure methods can be successfully used in

encryption module to prevent DPA attack.

This dissertation consists of seven chapters which are as follows:

Chapter 1 [Introduction] introduces the basic conception of video coding system,

video encryption methods and cryptographic algorithms. H.264 video coding standards,

selective video encryption methods and AES algorithm are also introduced in this

chapter.

Chapter 2 [Selective Video Encryption Schemes] describes the recently proposed

video encryption schemes and some encryption algorithms used in these schemes. The

basic idea of selective encryption is to encrypt a part of video data, and leave others as

unencrypted. The security of selective encryption is low. However, it saves a lot of

computational cost. A brief survey is provided to clearly show the difference between

these schemes. Three main problems of these proposed schemes are discussed: security

problem, computation problem, and feasibility problem.

- ii -

Chapter 3 [Unequal Secure Encryption Scheme for H.264/AVC] describes the

proposed Unequal Secure Encryption (USE) scheme for H.264/AVC. The purpose of this

scheme is to reduce the computational cost while keeping high security level. The main

idea of USE scheme is that using high secure algorithm to encrypt important data

partition, and using low secure algorithm to encrypt unimportant data partition. All of

data are encrypted to improve the security. Some new ideas of the proposed scheme are

listed as follows:

1) Data classification methods. Three data classification methods are proposed: Data

Partitioning, FMO and Parameter Extraction. Each method is proposed for different

coding profiles of H.264/AVC.

2) Multiple security levels definition. Four security levels are defined to make a trade-off

between security and computational cost. For security level 0, the computational cost

is only 18% of full encryption, and for level 3, the computational cost is 50% of full

encryption. Compared to the other selective encryption schemes, our scheme achieves

much lower computational cost while performs 100% video data encryption.

3) Hybrid encryption module. This module includes two encryption functions: AES

encryption for important data partition, and FLEX encryption for unimportant data

partition. Our proposed FLEX algorithm achieved 5 times throughput of AES, and it

can reuse the hardware of AES.

Chapter 4 [Hardware Design of AES] presents the proposed scalable hardware

architecture for AES algorithm. Since performance requirement for different video

applications changes very much, the scalability of hardware design becomes very

important. Parallel data path and configurable hardware modules are used to achieve high

scalability. The experimental results show that the throughput of lowest cost AES

implementation which uses 1 S-Box and 1 MixColumn is 75 Mbps, while the highest cost

AES with 20 S-Box and 4 MixColumn can be 2.4 Gbps. Our design approaches a new

way for scalable hardware design of AES. Compared to the other AES architectures

- iii -

which are not scalable, it can be used for designing AES under various performance

specifications. As a result, it is much suitable for video encryption systems.

Chapter 5 [DPA Attack on AES] introduces the side-channel attack methods,

especially for Differential Power Analysis (DPA) attack. DPA attack method is proposed

by Paul Kocher in 1998, which can successfully recover the secret key by collecting

power consumption of these devices. It posed a serious threat to the security of

cryptographic devices. The detailed attack procedure on AES and some recently proposed

countermeasure methods are also discussed in this chapter.

Chapter 6 [AES Design with DPA Countermeasure] presents our proposed AES

designs with DPA attack countermeasure. A hybrid countermeasure solution which

includes five methods, Independent ARK, Data Sliding, Subbyte Hiding, Simplified S-Box

Masking and Registers Masking is proposed. The theoretical analysis shows that our

solution increases the complexity of DPA attack to 212N times. In this way, even if one or

two countermeasure methods are cracked, the remained other countermeasure methods

can also prevent a successful attacking. There are few papers about hardware design of

DPA countermeasure methods. In this dissertation, the detailed hardware implementation

of DPA countermeasure methods is proposed. Moreover, an ultra low-cost AES with

proposed five countermeasure methods for real-time video encryption is designed. A test

chip includes four AES core is implemented in VDEC project (RHOM 0.18 um, Chip

size is 2.5mm×2.5mm): ○1 AES0: Pure AES design without any countermeasure

methods. It achieves lowest hardware cost (4678 Gates) with proper throughput (51

Mbps), and clock frequency (80 MHz). ○2 AES1.0: AES design with Independent ARK

and Data Sliding. The hardware cost is 5500 Gates, the clock frequency achieves 125

MHz, and throughput is 75 Mbps. ○3 AES1.1: AES design with Subbyte Hiding. The

hardware cost is 6244 Gates, and the clock frequency and throughput are same as AES1.0.

○4 AES1.2: AES design with Simplified S-Box Masking and Registers Masking. The

- iv -

hardware cost is 6834 Gates and the clock frequency is reduced to 75 MHz. The

throughput is also reduced to 45 Mbps. In order to evaluate the effectiveness of proposed

countermeasure methods, a DPA attack system based on SASEBO board is designed. The

DPA attack experiment results show that, the AES design with our proposed

countermeasure methods (AES1.0, AES1.1, AES1.2) can successfully prevent DPA

attack.

Chapter 7 [Conclusion] concludes the contributions of this dissertation.

Keywords

Video Encryption, H.264/AVC, Unequal Secure Encryption, AES, Side-channel Attack,

Differential Power Analysis, Low cost, Scalable Architecture, VLSI, Sasebo Board

- v -

Acknowledge

First of all, I would like to appreciate Professor Satoshi Goto, for his guidance,

instructions, and support during my research. He advised me to setup a research goal and

to achieve it step by step. What I learned from him must be the most valuable asset in my

life. I also thank Professor Takeshi Ikenaga for his continuous support, instructions and

insightful comments on my work. He gave me a lot of valuable and helpful advice in

detailed technical problems. I also express my appreciation to Professor Yoshimura, for

his continuous support, encouragement and insightful comments throughout my research

work.

I also thank Dr. Tsunoo (NEC Central Research Lab) for advising me in cryptography

research. His great knowledge in cryptography helps me to find the right research

directions and instruct me how to continue my work. Thanks to Mr. Kimura (Y.D.K.

Corp.), Mr. Nozawa (Y.D.K. Corp.) and Mr. Syouji (Y.D.K. Corp.) for helping me to use

Sasebo Board.

I also thank to Mr. Jidong Wang and Mr. Guoyu Qian for working with me in video

encryption and side-channel attack. Thanks to the graduated students of Goto Lab: Dr.

Yang Song, Dr. Lingfeng Li, Dr. Shen Li, Dr. Jing Wang. Discussion with you gave me

great inspirations in my research work. Thanks to all of students of Goto lab, you make

my life be joyful. Thanks also give to all of my friends, I appreciate every moment with

you.

Finally, I would like to thanks my family for their unconditionally support and love.

- vi -

Contents Abstract .................................................................................................................................................... i Acknowledge .......................................................................................................................................... v List of Tables ........................................................................................................................................ viii List of Figures ........................................................................................................................................ ix List of Notations .................................................................................................................................... xi 1 Introduction ..................................................................................................................................... 1

1.1 Video Compression ............................................................................................................. 1 1.2 Video Encryption ................................................................................................................ 5 1.3 Cryptography ...................................................................................................................... 7 1.4 Our Contributions and Dissertation Organization ............................................................... 9

2 Selective Video Encryption Schemes ............................................................................................ 12 2.1 Visual Data Formats .......................................................................................................... 12

2.1.1 Video Sequence ..................................................................................................... 12 2.1.2 Coded video stream format ................................................................................... 14

2.2 Conventional video encryption methods ........................................................................... 15 2.2.1 Cryptography based video encryption ................................................................... 15 2.2.2 Permutation based video encryption ..................................................................... 16

2.3 A Survey of selective video encryption schemes .............................................................. 17 2.4 Problems of current video encryption scheme .................................................................. 21 2.5 Conclusion ........................................................................................................................ 23

3 Unequal Secure Encryption (USE) Scheme for H.264 ................................................................. 24 3.1 Introduction of H.264 ........................................................................................................ 24 3.2 USE Scheme for H.264/AVC ............................................................................................ 27

3.2.1 Data Partition Methods ......................................................................................... 28 3.2.2 Data Partition Methods ......................................................................................... 29 3.2.3 Security levels ....................................................................................................... 33 3.2.4 Encryption Methods .............................................................................................. 34

3.3 Comparison ....................................................................................................................... 37 3.4 Conclusion ........................................................................................................................ 39

4 Hardware Design of AES .............................................................................................................. 44 4.1 Introduction of AES Algorithm ......................................................................................... 44 4.2 Existing low-cost implementations of AES ...................................................................... 46 4.3 Proposed Scalable Hardware Architecture for AES .......................................................... 50

4.3.1 Top Level Architecture .......................................................................................... 50 4.3.2 Two typical subclass architectures ........................................................................ 54 4.3.3 Sub-Modules’ Design ............................................................................................ 57

4.4 Performance Analysis ........................................................................................................ 59 4.4.1 Scalability .............................................................................................................. 59

- vii -

4.4.2 Dataflows .............................................................................................................. 60 4.4.3 Hardware Implementation ..................................................................................... 61

4.5 Conclusion ........................................................................................................................ 65 5 DPA Attack on AES ...................................................................................................................... 66

5.1 Introduction of Differential Power Analysis attack ........................................................... 66 5.1.1 Power Consumption of CMOS Circuit ................................................................. 68 5.1.2 Power Model ......................................................................................................... 71 5.1.3 Hypothetical Power Consumption based on HD model: Case study .................... 72 5.1.4 Differential Power Analysis Attacks ..................................................................... 75

5.2 DPA attack on AES ........................................................................................................... 76 5.2.1 DPA attack on AES: An Example .......................................................................... 76 5.2.2 DPA attack on AES: A successful attack and a failed attack ................................. 78

5.3 Conventional Countermeasure Methods ........................................................................... 81 5.4 Conclusion ........................................................................................................................ 85

6 AES Design with DPA Countermeasure ....................................................................................... 86 6.1 Proposed DPA Countermeasure methods for AES ............................................................ 86

6.1.1 Register Masking .................................................................................................. 86 6.1.2 S-Box Masking ...................................................................................................... 89 6.1.3 Subbytes Hiding .................................................................................................... 90 6.1.4 Independent ARK and Data Sliding ...................................................................... 93 6.1.5 Time Complexity Analysis .................................................................................... 97

6.2 Ultra Low-cost Design of AES with DPA Countermeasure ............................................ 101 6.2.1 Specification ........................................................................................................ 101 6.2.2 Hardware Architecture ........................................................................................ 103 6.2.3 Data Flow ............................................................................................................ 105 6.2.4 Implementation ................................................................................................... 106

6.3 DPA Attack Evaluation Environment .............................................................................. 108 6.3.1 DPA attack platform ............................................................................................ 108 6.3.2 Sasebo Board ....................................................................................................... 109 6.3.3 Test Flow ............................................................................................................. 111

6.4 Experiment Results of DPA Attack ................................................................................. 113 6.5 Chip Design ..................................................................................................................... 118 6.6 Conclusion ...................................................................................................................... 120

7 Conclusion .................................................................................................................................. 121 Reference ............................................................................................................................................ 123 Publications ......................................................................................................................................... 131

International Journal ................................................................................................................... 131 International Conference (with review) ...................................................................................... 131 Domestic Conference (with review) ........................................................................................... 132 Domestic Conference (without review) ...................................................................................... 133

- viii -

List of Tables

Table 2.1 A survey of selective video encryption schemes ........................................................ 18 Table 3.1 Security levels in the USE scheme. ............................................................................ 33 Table 3.2 Video data partition size ............................................................................................. 40 Table 3.3 Video data partition for different security levels. ....................................................... 41 Table 3.4 Comparison with other video encryption schemes. .................................................... 42 Table 4.1 Hardware cost of 32-bit AES @ 131 MHz, 0.11um, [48]. ......................................... 47 Table 4.2 Hardware cost of 8-bit AES @ 100 KHz, 0.35um, [50]. ............................................ 49 Table 4.3 Bit width of operations in AES algorithm. ................................................................. 53 Table 4.4 Comparison of two architectures. ............................................................................... 57 Table 4.5 Possible implementations of AES based on scalable architecture. ............................. 59 Table 4.6 Hardware cost of lowest cost AES @ 123 MHz, 0.18 um. ........................................ 62 Table 4.7 Hardware cost of highest performance AES @ 416 MHz, 0.18 um. .......................... 62 Table 4.8 Scalability of hardware implementations. .................................................................. 63 Table 4.9 Comparison with others’ architecture. ........................................................................ 63 Table 5.1 Power consumption of four transitions in a circuit. .................................................... 69 Table 6.1 Summary of different countermeasure methods. ...................................................... 100 Table 6.2 Comparison of time complexity for each countermeasure methods. ....................... 100 Table 6.3 Max bit-rate and resolution of selected H.264 levels. .............................................. 102 Table 6.4 AES0@80MHz, TSMC 0.18um ............................................................................... 107 Table 6.5 AES1.1@125MHz, TSMC 0.18um .......................................................................... 107 Table 6.6 AES1.0@125MHz, TSMC 0.18um .......................................................................... 107 Table 6.7 AES1.2@75MHz, TSMC 0.18um ............................................................................ 107 Table 6.8 VDEC Test Chip. ...................................................................................................... 119

- ix -

List of Figures

Figure 1.1 Video encoder/decoder system ................................................................................... 3 Figure 1.2 Video encoder. ............................................................................................................ 4 Figure 1.3 Video decoder. ............................................................................................................ 4 Figure 1.4 Secure Video System. ................................................................................................. 6 Figure 2.1 Video Sequence: I, P, B Frames and I, P, B MBs. ..................................................... 13 Figure 2.2 Coded Video Stream Format. .................................................................................... 14 Figure 2.3 Samples of Selective Encryption. ............................................................................. 22 Figure 3.1 H.264 Baseline, Main and Extended Profiles. .......................................................... 25 Figure 3.2 H.264/AVC data format. ........................................................................................... 25 Figure 3.3 Unequal Secure Encryption Scheme. ........................................................................ 28 Figure 3.4 Data Partition in H.264/AVC Extended Profile. ....................................................... 30 Figure 3.5 Data Partition by FMO. ............................................................................................ 31 Figure 3.6 Data Partition by Parameters Extraction. .................................................................. 32 Figure 3.7 FLEX encryption algorithm. ..................................................................................... 35 Figure 3.8 Leak position in the even and odd rounds. ............................................................... 35 Figure 3.9 XOR Method ............................................................................................................ 37 Figure 3.10 Comparison of security and computational complexity. ......................................... 43 Figure 4.1 Dataflow. (a) Encryption. (b) Decryption. ................................................................ 45 Figure 4.2 Transformations in AES algorithm. .......................................................................... 45 Figure 4.3 32-bit architecture for AES. ...................................................................................... 47 Figure 4.4 8-bit architecture for AES. ........................................................................................ 49 Figure 4.5 Scalable Hardware Architecture for AES ................................................................. 51 Figure 4.6 Shared S-Box Architecture. ...................................................................................... 55 Figure 4.7 Unified S-Box Architecture ...................................................................................... 56 Figure 4.8 S-Box structure. ........................................................................................................ 58 Figure 4.9 MixColumns structure. ............................................................................................. 58 Figure 4.10 Dataflows for scalable architecture. ........................................................................ 60 Figure 4.11 Comparison with others’ AES design. .................................................................... 64 Figure 5.1 CMOS Inverter.......................................................................................................... 68 Figure 5.2 Power consumption of a circuit: Case I. ................................................................... 74 Figure 5.3 Power consumption of a circuit: Case II. .................................................................. 74 Figure 5.4 Last round of AES module........................................................................................ 76 Figure 5.5 2-D views of successful DPA attack. ........................................................................ 79 Figure 5.6 3-D views of successful DPA attack. ........................................................................ 79 Figure 5.7 2-D views of failed DPA attack. ............................................................................... 80 Figure 5.8 3-D views of failed DPA attack. ............................................................................... 80 Figure 5.9 Time dimension hiding. ............................................................................................ 82 Figure 5.10 Amplitude dimension hiding. ................................................................................. 82

- x -

Figure 5.11 AES after masking. ................................................................................................. 84 Figure 5.12 S-Box after masking. .............................................................................................. 85 Figure 6.1 The round ith of the AES without and with masking countermeasures. .................... 87 Figure 6.2 Proposed Registers Masking. .................................................................................... 88 Figure 6.3 Proposed S-Box Masking. ........................................................................................ 90 Figure 6.4 A power trace of AES. .............................................................................................. 91 Figure 6.5 Subbytes without and with hiding. ........................................................................... 91 Figure 6.6 Hardware design of Subbytes hiding. ....................................................................... 92 Figure 6.7 Integrated Subbytes and AddRoundKey. .................................................................. 94 Figure 6.8 Separated Subbyte and AddRoundKey. .................................................................... 94 Figure 6.9 Feedback structure and Data Sliding Structure. ........................................................ 95 Figure 6.10 Ultra low-cost AES with DPA countermeasure. ................................................... 104 Figure 6.11 Data flow for ultra low-cost AES. ......................................................................... 104 Figure 6.12 DPA Attack Evaluation System (Photo). ............................................................... 110 Figure 6.13 DPA Attack Evaluation System (Architecture). .................................................... 110 Figure 6.14 Sasebo Board. ....................................................................................................... 111 Figure 6.15 DPA attack test flow. ............................................................................................. 112 Figure 6.16 Power trace from oscilloscope .............................................................................. 113 Figure 6.17 2-D view of DPA attack on Pure AES. .................................................................. 114 Figure 6.18 3-D view of DPA attack on Pure AES. .................................................................. 114 Figure 6.19 2-D view of DPA attack on AES with Subbytes hiding. ....................................... 115 Figure 6.20 3-D view of DPA attack on AES with Subbytes hiding. ....................................... 115 Figure 6.21 2-D view of DPA attack on AES with masking. ................................................... 116 Figure 6.22 3-D view of DPA attack on AES with masking. ................................................... 116 Figure 6.23 2-D view of DPA attack on AES with Independent ARK and Data Sliding. ........ 117 Figure 6.24 3-D view of DPA attack on AES with Independent ARK and Data Sliding. ........ 117 Figure 6.25 Test Chip Architecture. ......................................................................................... 118 Figure 6.26 Chip design of AES .............................................................................................. 119

- xi -

List of Notations

VOD Video on Demand AVC Advanced Video Coding MPEG Moving Picture Experts Group VCEG Video Coding Experts Group MB Macro Block I-MB Intra-coded Macro Block P-MB Inter-coded Macro Block B-MB Bi-directional coded Macro Block MV Motion Vector MVD Motion Vector Difference DCT Discrete Cosine Transform Q Quantization MC Motion Compensation FMO Flexible Macroblock Ordering VLC Variable Length Coding DES Data Encryption Standard AES Advanced Encryption Standard USE Unequal Secure Encryption FLEX Fast Leakage EXtraction DPA Differential Power Analysis CPA Correlation coefficient Power

Analysis GF Galois Field I.ARK Independent AddRoundKey

D.S. Data Sliding bps bit-per-second P Power consumption λ Coefficient for power modeling HD Hamming Distance HW Hamming Weight n Noise R/REG Registers C/Comb Combinational logic Inv Inverse Operation T / t Time K / k Key X / Y Random number ~o(DPA) Time complexity of DPA attack on

pure AES

()f Function

AES0 Pure AES AES1.0 AES + I.ARK&D.S. AES1.1 AES + Hiding AES1.2 AES + Masking SASEBO Side-channel Attack Standard

Evaluation Board

- 1 -

Introduction

1 Introduction

Multimedia is a hot topic in this IT era, especially for telecommunication and internet.

In ten years ago, people use text-based method, such as ICQ, to communicate with each

other in internet. People published their information in internet only by text or picture.

And now, things change! We talk to others face-to-face in internet by using a monitor and

a web camera. We use skype to make an internet call for free. We share our video in

youtube, share our personal photos in picasa, and watch TV in PPStream or enjoy music

in Kugou. All of these wonderful applications can be used in internet for free. Even more,

we can use our mobile phone to do it! We can enjoy everything in everywhere. The virtual world becomes more and more attractive because we can sense it. And the

videos play a most important role. Some very popular video applications include: VOD

(Video On Demand) which is used to watch movies in internet, Pay-TV, which is widely

used in television set-top box, and Video conference. However, the data size of video is

very huge, which makes video data transmission and storage become a problem. In order

to reduce data size, people proposed a lot of video compression methods to compress

video data, such as MPEG-4 and the latest H.264/AVC. In the other hand, in order to

protect the sensitive information in video, people proposed many video encryption

schemes to encrypt video data. The cryptosystems are widely used in many video

applications.

1.1 Video Compression

Video makes multimedia applications be more attractive. However, the uncompressed

video sequence requires a large bit rate (approximately 2Gbps for HDTV 1080p). In this

way, the compression is necessary for practical storage and transmission of digital video.

- 2 -

Introduction

Video compression research has long history. During the past decades, many

international standards are developed, namely the ISO/IEC MPEG-x series

[1][2][3][4][5], and the ITU-T H.26x series [6][7]. The Moving Picture Experts Group

(MPEG) is a working group of the International Organization for Standardization (ISO)

and the International Electrotechnical Commission (IEC). MPEG’s remit is to develop

standards for compression, processing and representation of moving pictures and audio. It

has been responsible for a series of important standards, such as MPEG-1 [1]

(compression of video and audio for CD playback), MPEG-2 [2] (storage and

broadcasting of television quality video and audio). MPEG-4 [3] (coding of audio-visual

objects) is the latest standard that deals specifically with audio-visual coding. MPEG-7 [4]

and MPEG-21 [5] are concerned with multimedia content representation and a generic

multimedia framework respectively. MPEG is best known for its contribution to audio

and video compression. Particularly, MPEG-2 is widely used in digital TV broadcasting,

and DVD video and MPEG Layer 3 audio coding has become very popular for music

storage and sharing. The Video Coding Expert Group (VCEG) is another working group of the

International Telecommunication Union Telecommunication Standardization Sector

(ITU-T). VCEG has been developed a series of standards related to video communication

over telecommunication networks and computer networks, such as H.261 [6] standard

(first widely used standard for video conference), and the followed H.263 [7] or later

versions. Since 2001, the cooperation between VCEG and MPEG was carried out, and a

new organization JVT came out. The Joint Video Team (JVT) consists of members of

MPEG and VCEG, and its main purpose is to develop a new video coding standard,

entitled ‘Advanced Video Coding’ (AVC) [8] which was also known as MPEG-4 part 10

or H.264.

- 3 -

Introduction

Video Encoder

Video Decoder

displayTransmitChannel

Figure 1.1 Video encoder/decoder system

A basic video encoder/decoder system is shown in Figure 1.1. Camera captures video

sequences and transfers all of uncompressed video data to video encoder. Video encoder

compresses video data according to specific coding standards, and then transfers the

coded video stream to transmit channel. In the receiver part, video decoder receives

coded video stream and decodes it by using same coding standards, and then the display

plays the video sequence. A conventional architecture of video encoder is shown in Figure 1.2. It consists of

three main functional units: a temporal model, a spatial model and an entropy encoder.

The temporal model attempts to reduce temporal redundancy by exploiting the

similarities between neighbouring video frames or neighbouring macro blocks (MBs)

within one frame. In this figure, there are two predictors: Inter predictor for inter-frame

prediction, and intra predictor for intra-frame prediction. The output of temporal model

includes residual data and a set of model parameters, such as motion vectors, block types,

prediction types and so on. The spatial model makes use of similarities between neighbouring samples in the

residual frame to reduce spatial redundancy. In MPEG-4 visual and H.264, this is

achieved by DCT transformation and quantization. The input of spatial model is the

residual data produced by temporal model, and the output data of spatial model is a set of

quantized transform coefficients.

- 4 -

Introduction

Figure 1.2 Video encoder.

Figure 1.3 Video decoder.

- 5 -

Introduction

Entropy encoder is used to compress the parameters produced by temporal model and

the transform coefficients p spatial model. The final encoded video stream consists of

header information, parameters information, coded motion vectors and coded residual

data.

A conventional architecture of video decoder is shown in Figure 1.3. The

architecture of video decoder is much simpler than encoder. Entropy decoder extracts the

header, parameters and transforms coefficients from coded video stream. Prediction

parameters are used to reconstruct prediction data in motion compensation (MC) module.

Combines with residual data from inverse DCT and inverse Quantization, the original

video sequence can be reconstructed.

1.2 Video Encryption

With the increase of multimedia applications, huge amounts of digital visual data are

stored on different media and exchanged over various sorts of networks. As a

consequence, techniques are required to provide security functionalities such as privacy,

integrity or authentication. ‘Video security’ is aimed towards these emerging

technologies and applications. To protect the video content, the conventional technologies can be classified into three

categories: 1) Encryption technology to provide end-to-end security when distributing

video over internet or other public communication channel. 2) Watermarking technology

to achieve copyright protection, ownership trace, and authentication. 3) Access control

technology to present unauthorized access. In this dissertation, we focus on video data

encryption technology.

- 6 -

Introduction

Figure 1.4 Secure Video System.

Several dedicated international meetings have emerged as a forum to represent and

discuss recent developments in this field, such as “ACM multimedia security workshop”,

“Communications and multimedia security”, and some journal special issues like IEEE

transaction, and EURASIP and so on. However, a common video encryption standard

does not exist. A conventional video encryption system is shown in Figure 1.4. Two

secure modules (encryption and decryption) are added.

Several review papers have been published on video encryption, such as Liu and

Eskicioglu’s work in [9], Qiao and Nahrstedt’s overview in [10], and Furht, Socek and

Eskicioglu’s survey in [11]. From these review papers and other literatures from internet,

we found that most of the proposed video encryption schemes are designed for previous

video coding standards such as MPEG-1, MPEG-2/H.262, MPEG 4 and H.263. And the

selective encryption was mostly used in these proposed schemes. Selective encryption is

an encryption method to encrypt a portion of video bit-stream. Respectively, full

encryption will encrypt whole video bit-stream by using a specific encryption algorithm.

- 7 -

Introduction

The full video encryption method has two different approaches: (a) Video scrambling

technology. Permuting the video in the time domain or the frequency domain, however, it

can’t provide substantial high security. (b) Encryption. Encrypting the entire video data

using standard cryptographic algorithm, it is often referred to as “naive approach” and its

computational cost is very high. The selective video encryption also can be further classified into three types: temporal

domain scheme, spatial domain scheme and entropy coding scheme. Temporal domain

scheme selects temporal model parameters such as motion vectors, DCT coefficients, I

blocks, I frames and so on. Most of the selective encryption methods are based on

temporal domain [14-32]. Spatial domain scheme makes use of spatial model parameters

in video data. In [23], it makes use of quadtree structure of motion vectors and quadtree

structure of residual errors to do video encryption. Entropy coding scheme uses special

entropy codec to do encryption. In [33-35], they use multiple Huffman tables and

multiple state indices in the entropy encoder. The detailed introduction and discussion of selective video encryption schemes are

provided in Chapter 2.

1.3 Cryptography

Before the modern era, cryptography was concerned solely with message

confidentiality. In recent decades, the field has expanded beyond confidentiality concerns

to include techniques for message integrity checking, sender/receiver identity

authentication, digital signatures, interactive proofs, and secure computation, amongst

others. Modern cryptography can be divided into two types: 1) Symmetric key

cryptography and 2) Asymmetric key cryptography.

- 8 -

Introduction

Symmetric Key Cryptography

The common property for symmetric key cryptography is: a shared secret key is used

in between communication parties. This key is used both as an encryption key and as a

decryption key. Most of the modern cryptosystems use symmetric key cryptography.

Some famous and widely used modern symmetric cryptosystem includes DES [36] (Data

Encryption System), IDEA [40], Triple-DES, AES [37] (Advanced Encryption System),

and so on. Typical Key sizes are 56-bits (DES), 128 bits (IDEA, AES), 192 and 256 bits

(AES).

Asymmetric Key Cryptography

Asymmetric key cryptography is also called public key cryptography. There are two

different keys in this system: a public key, which is publicly known, and the secret key,

which is secret by the owner. The system is called ‘asymmetric’ since the different keys

are used for encryption and decryption. If data is encrypted with a public key, it can only

be decrypted with the corresponding secret key and vice versa. The asymmetric key

cryptography pushed the classical cryptosystem to the ‘modern’ era. Two famous

asymmetric cryptosystems are: RSA [38] (Ron Rivest, Adi Shamir, and Leonard

Adleman at MIT) and ECC [39] (Elliptic curve cryptography). RSA is the most popular

used asymmetric cryptosystem with key length 1024 bits, 2048 bits or longer. ECC is a

new asymmetric cryptosystem which has smaller key size.

Both of symmetric key and asymmetric key cryptography has their advantages and

disadvantages. Symmetric ciphers require much smaller key size for the same level of

security and the computations for symmetric ciphers are much faster and the memory

requirements are smaller. However, since every party share a same key, they should keep

the key absolutely secret. This becomes more dangerous with an increasing number of

involved parties. In a conventional cryptosystem, the asymmetric ciphers are used for key

exchange, authentication, digital signature and integration check, and the symmetric

- 9 -

Introduction

ciphers are used for data encryption. For video encryption, symmetric ciphers are widely

used. Meanwhile, some other scrambling methods with low security are also widely used

in selective video encryption. In the other hand, ‘side-channel attack’ becomes a very popular word recently.

Side-channel attack uses side-channel information of crypto-devices, such as power

consumption or time consumption, to detect the secret key in these devices. Especially,

differential power analysis (DPA) has been successfully used to crack symmetric ciphers

as DES, and asymmetric ciphers as RSA. It posed a serious threat to the security of

current cryptosystems.

1.4 Our Contributions and Dissertation Organization

In this dissertation, a new video encryption scheme for H.264/AVC and the hardware

design of encryption module are proposed.

Unequal Secure Encryption (USE) scheme is proposed for H.264/AVC video

coding standard. There are three major targets in the USE scheme: security, feasibility,

and low computational cost. In the USE scheme, we encrypt the entire video data using

standard cryptography to make our scheme highly secure. We perform all of the

encryption operations after entropy coding to separate the video coding system and

encryption system. In this way, our USE scheme is feasible in any kind of video security

applications. The remaining problem is computational cost. As computational cost of

“naive approach” is huge, we need to make some optimization to reduce the

computational cost. Here we use two methods: (1) Data classification. We classify the

total video data into two data partitions, important data partition and unimportant data

partition. Many new features in H.264/AVC make this procedure easy to implement.

Normally, important data partition has smaller size than unimportant one. (2) Unequal

secure encryption. We use AES to encrypt important data partition and use our proposed

- 10 -

Introduction

FLEX to encrypt unimportant data partition. FLEX is a cipher based on AES. The

computational cost of FLEX is only 20% of AES. In this way, we can keep our scheme

highly secure with low computational cost.

Scalable Hardware Architecture for AES is proposed for low cost design of

AES. This architecture defines a scalable framework for designing specific AES module

with different throughput and hardware cost. Especially, it is very useful for low-cost,

low power design of AES module. There are two important features in this architecture: 1)

Scalable S-Box and Scalable MixColumns design. It supports different number of S-Box

and MixColumns running in this architecture. As we know, while the number of S-Box

and MixColumns increasing, the throughput will increase, and the hardware cost also will

increase. This scalable architecture supports 1-20 S-Boxes, 1-4 MixColumns which

means that the designer can implement an AES design according to various performance

requirements. 2) Parallel data path design. The proposed architecture has three main data

paths: S-Box data path, Mixcolumns data path and AddRoundKey data path. All of these

data paths have different bit width. The advantage is high scalability and shorten critical

path.

DPA Countermeasure methods for AES design is proposed to counter-

measure DPA attack. Five countermeasure methods are proposed in this dissertation: 1)

Independent ARK is used to separate AddRoundkey operation from other operation in

hardware data path. 2) Data Sliding is used to scramble the register data. 3) Subbytes

Hiding is to randomize the Subyytes operation in time domain. 4) Simplified S-box

masking is proposed to induce randomization in S-Box power consumption. 5) Register

masking is proposed to induce randomization in register power consumption. All of the

five methods can be used together or independently. The theoretical analysis and

experimental results show that our proposed methods greatly increase the security and

efficiently countermeasures DPA attack.

- 11 -

Introduction

Ultra low-cost implementation of AES with DPA Countermeasure is

proposed for video encryption. It bases on the proposed scalable architecture and

combines all of proposed countermeasure methods together. Only one S-Box is used in

this implementation, and the hardware cost is extremely low, about 7k gates by TSMC

0.18 standard cell library. The throughput is about 75 Mbps which can be used in

real-time video encryption. Five countermeasure methods are used, and they are

evaluated by our DPA attack system. A test chip which includes 4 AES implementations

is designed. The rest of this dissertation is organized as follows: a survey of selective video

encryption schemes is given in Chapter 2. The proposed Unequal Secure Encryption

(USE) scheme is presented in Chapter 3. The scalable hardware architecture for AES is

presented in Chapter 4. The DPA attack and conventional countermeasure methods are

introduced in Chapter 5. The proposed DPA countermeasure methods, and ultra low-cost

implementation of AES with DPA countermeasure are presented in Chapter 6. Finally,

Chapter 7 concludes the whole dissertation.

- 12 -

Selective Video Encryption Schemes

2 Selective Video Encryption Schemes

In order to encrypt video data in real-time, selective video encryption is proposed to

reduce computational cost. The basic idea of selective encryption is to encrypt a part of

compressed video data. As a result, the computational cost can be reduced. The selected

part of data is considered as important data part. There is no clearly definition for

importance of video data. Normally, the importance can be replaced by another word:

Difficulty. The lost data causes more difficult to reconstruct video, it is regarded as more

important. For example, the header information, the parameters is much more important

than VLC data in video stream. Over the past years, a number of different selective video

encryption schemes for different video coding standards have been proposed. An

introduction and discussion of these schemes are presented in this chapter.

2.1 Visual Data Formats

2.1.1 Video Sequence

Video sequence is usually organized in a rectangular arrays denoted as frames. As

shown in Figure 2.1. Figure 2.1 A) shows a frames structure in video sequence. The video

data is organized as frame by frame. Every frame contains a picture. The frames are

played in time axis to form a video. There are several types of frame: I-frame, P-frame

and B-frame, as shown in B). I-frame is short for Intra frame. I-frame is independent with

other frames. It is encoded by intra-frame prediction. P frame is short for Predicted frame,

which means that it is predicted by the previous frames. B frame is short for

Bi-directional Predicted frame. Both of backward frames and forward frames are used for

prediction. B-frame has two reference frames while P-frame only has one reference frame.

In order to reconstruction a picture, for I-frame, it can reconstruct by itself, for B and P

- 13 -

Selective Video Encryption Schemes

frames, they needs to combine with reference frames to reconstruct a picture.

Figure 2.1 C) shows the MB structure in a frame. The size of MB can be various,

which depends on the MB size definition in specific video coding standards. Normally,

there are two kinds of MB: I-MB and P, B-MB. Similar with I, P, B frames, I-MB is intra

MB which is predicted by the surrounding MBs in the same frame. P, B-MB is inter MBs

which is predicted by MBs in the other frames.

Figure 2.1 Video Sequence: I, P, B Frames and I, P, B MBs.

- 14 -

Selective Video Encryption Schemes

Figure 2.2 Coded Video Stream Format.

2.1.2 Coded video stream format

A coded video sequence consists of one or more video packets. A video packet is

analogous to a slice or a frame in MPEG-1, MPEG-2 or H.264, and consists of a

resynchronization marker, a header field and a serious of coded macro blocks. As shown in Figure 2.2, the coded video stream is formatted as a layered structure:

Sequence layer defines the properties of whole video sequence by sequence header, and

follows with a string of frames. Frame layer includes the frame header which indicate the

frame properties, such as frame type, whether used for prediction or not and so on, and

MBs in this frame. MB layer consists of MB parameters as MB type, MB partitions, MB

prediction methods and so on. For P, and B MB, it includes motion vectors and VLC

(Residual data coded by Variable Length Code) data. For I MB, it doesn’t include motion

vectors.

- 15 -

Selective Video Encryption Schemes

2.2 Conventional video encryption methods

2.2.1 Cryptography based video encryption

Cryptography based video encryption means that the selected video contents are

encrypted by a cryptosystem, such as DES, AES. The security is highly depends to

cryptosystem. In other words, the security is approved and certified.

Header encryption

Since the header information in video stream contains a lot of parameters to

reconstruct video data, most of schemes for video encryption took use of headers. There

are several headers in different layers in video bit stream: Sequence header contains

global parameters for whole video sequence. Slice header only defines constrains for

current slice. MB header is the lowest level header which consists of MB type, MB

partitions and so on.

I-frame, I-MB encryption

I-frame and I-MB is very important to reconstruct a picture, because they can be

reconstructed without needing any other information. I-frame is widely used to

synchronize video data or to recover the broken pictures in video stream. I-MB is also

very important for MB reconstruction and then used for prediction. Since P and B frames

are reconstructed based on predictions obtained from I-frame, the main assumption is that

if these are encrypted, P and B frames are expected to be protected well.

Motion Vectors encryption

Motion vector is used in B and P frames to indicate the prediction positions of each

MB. In decoder part, the MC (motion compensation) block uses motion vector to

reconstruct the prediction of MB. If motion vector is lost, the decoder can’t generate right

prediction and the final picture also can’t be reconstructed correctly. Motion vector

- 16 -

Selective Video Encryption Schemes

comprise about 10%~20% of the entire video data, therefore, lots of high secure

encryption schemes tend to encrypt it.

VLC encryption

VLC data occupies most part of the video data, about 60~80% of whole video data.

Most of the proposed video encryption schemes leave it unencrypted to save

computational cost. However, it may pose a serious security problem in the future. VLC

data also can be classified into two types: I-MB VLC and B, P-MB VLC. Some high

secure encryption schemes encrypt I-MB VLC and leave B, P-MP VLC unencrypted,

because I-MB VLC only occupies about 20% of total VLC data.

2.2.2 Permutation based video encryption

Permutation based video encryption means the selected video contents are encrypted

by permutation, scrambling, shuffling or other simple methods. This kind of encryption

methods target low computation and low security video encryption. Their security is very

weak and not approved.

Macroblock Permutation

As discussed in Section 2.1.1, each frame consists of the same number of macroblocks.

Each macroblock contains a piece of picture. Microblock permutation is to exchange the

order of macroblocks within a frame. This is an encryption variant which is annoying but

not secure. The reason is that based on the correlation of border pixels the originally

neighboring macroblocks can be regained. And this effect becomes more risky when

there are more frames permuted using the same order.

Motion Vectors Permutation

Each predicted P or B-MB has a corresponding motion vector. It is possible to permute

the motion vectors which assigned to distinctive macroblocks. Since it only affects the P

- 17 -

Selective Video Encryption Schemes

and B frames, and in many cases many motion vectors within a same frame have the

same overall directions. Thus, the distortion of encrypted video is very light.

DCT Coefficient Permutation

Similar to macroblock permutation, DCT coefficient permutation is to exchange the

DCT coefficients within a macroblock. There are two kinds of DCT permutation: DC

coefficient permutation and AC coefficient permutation. Since the number of DC

coefficients is much less than AC coefficients, most of proposed schemes use DC

coefficients permutation. However, same as macroblock permutation, this method is not

secure too. It makes the reconstruction difficult, but not impossible.

Sign-bit Masking

A lot of coefficients of coded video stream have sign bit. Such as motion vector has

sign bit to indicate the direction, and DCT coefficient also has sign bit. Sign bit masking

is to mask the sign bit from 1 to -1 or from -1 to 1.

In a real scenario, most of the proposed video encryption schemes adopt more than one

encryption methods to ensure the security. Many schemes also define several security

levels to make balance between security and computational cost.

2.3 A Survey of selective video encryption schemes

There are a lot of selective video encryption schemes have been proposed. Liu and

Eskicioglu in [9], Furht, Socek and Eskicioglu in [11] have presented a comprehensive

classification include most of the presented selective video encryption algorithms. An

updated version of this classification is shown in table 2.1 [65]. According to their work,

these encryption schemes can be classified into three types: frequency domain schemes,

spatial domain schemes and entropy coding schemes. Comprehensive survey studies of

the video encryption techniques are given in [9-13].

- 18 -

Selective Video Encryption Schemes

Table 2.1 A survey of selective video encryption schemes

(adapted from [9][11])

Domain

Proposal

Encryption Algorithm Encrypted Content

Frequency Domain

Meyer & Gadegast, 1995 [14]

DES, RSA Headers, parts of I-blocks, all I-blocks, I-frames of the MPEG stream

Spanos & Maples, 1995 [15][16]

DES I-frames, sequence headers and ISO end code of the MPEG stream

Tang, 1996 [17] Permutation, DES DCT coefficients

Qiao & Nahrstedt, 1997 [18]

xor, permutation, IDEA Every other bit of the MPEG bit stream

Shi & Bhargava, 1998 [19]

xor Sign bit of DCT coefficients

Shi, Wang & Bhargava, 1999 [20]

IDEA Sign bit of motion vectors

Alattar, A-Regib and Al-Semari, 1999[21]

DES Every nth I-macroblock, headers of all the predicted macroblocks, header of every nth predicted macroblock

Shin, Sim & Rhee, 1999 [22]

Permutation, RC4 Sign bits of DC coefficients of I pictures. Random permutation of the DCT coefficients

Cheng & Li, 2000 [23]

No algorithm is specified Pixel and set related significance information in the two highest pyramid levels of SPIHT in the residual error

Tosun & Feng, 2000 [24]

VEA Lower layer of DC, AC. Divide the coefficients into 3 partitions, and encrypt the 2 lower layers by VEA

Wen, Severa, Zeng, Luttrell & Jin, 2002 [25]

DES, AES The information-carrying fields, either fixed length code (FLC) codewords, or variable length code (VLC) codewords

Zeng & Lei, 2002 [26]

Permutation, xor Selective bit scrambling, block shuffling, block rotation of the transform coefficients (wavelet and JPEG) and JPEG motion vectors

Wu & Mao, 2002 [27]

Any modern cipher, random shuffling on bit-planes in MPEG-4 FGS

Bitstream after entropy coding, quantized values before run length coding (RLC) or RLC symbols, intra bit-plane shuffling in MPEG-4 FGS

Choon, Samsudin & Budiarto, 2004 [28]

Diffusion and confusion Permutation between MBs and XOR template in MBs

Liu, Li & Dong, 2004 [29]

Permutation Permutation of MBs, DC, AC.

Liu, Ikenaga, Baba & Goto, 2004, 2006 [30][31]

DCEA, Event shuffle DCT coefficients

Wang, Fan, Ikenaga & Goto, 2007 [32]

Permutation, xor H.264 encryption technique. Sign bits of motion vectors, intra mode, trailing one of VLC code.

Spatial Domain Cheng & Li, 2000 [23]

No algorithm is specified Quadtree structure of motion vectors and quadtree structure of residual errors

Entropy Codec

Wu & Kuo, 2000; Wu & Kuo, 2001 [33][34]

Multiple Huffman tables, multiple state indices in the QM coder

Encryption of data by multiple Huffman coding tables and multiple state indices in the QM coder

Cheong, Hung, Tung, Ke & Chen, 2005 [35]

MHT(Multiple Huffman tables) rotation, xor

DCT coefficients, motion vectors, multiple Huffman coding tables

- 19 -

Selective Video Encryption Schemes

Some important selective video encryption schemes include:

SECMPEG [14]

SECMPEG, also called Secure MPEG was proposed by Meyer and Gadegast in 1995.

It was designed for the MPEG-1 video standard. It defines four levels of security:

1) Header data from the sequence layer down to the slice layer is encrypted.

2) Encrypt the same data as in level 1 and the low frequency DCT coefficients of all

blocks in I-frames.

3) Encrypt all I-blocks (Includes I-frames, I-blocks in B and P-frames).

4) Encrypt the entire video.

The authors chose DES symmetric cryptosystem to do encryption, which was the

natural choice since the DES is the official symmetric algorithm at that time. Anyway,

AES also can be used for SECMPEG for higher security.

Aegis [15][16]

Aegis was proposed by Maples and Spanos in 1995. It was initially designed for

MPEG-1 and MPEG-2 video standards. Aegis encrypts the following selection of a video

stream: Video sequence header, I-frames and ISO end code. It leaves B and P frame

unencrypted. The encryption engine was DES too. Ageis is very similar to SECMEG

level 3.

VEA [18]

VEA stands for Video Encryption Algorithm, which was developed by Qiao and

Nahrstedt in 1997. It was also designed for MPEG video standard. This algorithm is a

kind of whole video encryption, which is significantly different from other selective

video encryption schemes. The algorithm consists of the following four steps:

1) Let the 2n byte sequence, denote by a1, a2, … a2n, represent the video frames

2) Separate the sequence into two lists, even list (a1, a3, a5,…a2n-1) and odd list (a2,

- 20 -

Selective Video Encryption Schemes

a4, … a2n).

3) XOR two lists into one list: b1, b2, … bn = a1, a2, … a2n xor a2, a4, … a2n .

4) Apply the chosen symmetric cryptosystem E with secret key to encrypt either

odd list or even list. The cipher text sequence is {Ekey(a1, a3, a5,…a2n-1), b1, b2, …

bn} or {Ekey(a2, a4, … a2n), b1, b2, … bn}.

RVEA [19][20]

RVEA was proposed Shi, Wang, and Bhargava in 1999. Actually, they have proposed

four different video encryption schemes: Algorithm 1, Algorithm 2 (VEA), Algorithm 3

(MVEA) and Algorithm 4 (RVEA). RVEA is significantly more secure than the previous

three algorithms. RVEA encrypts the sign bits of DCT coefficients and motion vectors

which are simply extracted from the MPEG video sequence by using a conventional

cryptosystem such as DES or AES. And then, they restored the encrypted bits back to

their original position.

Alattar [21]

In 1999, Alattar, Al-Regib and Al-Semari presented a video encryption scheme based on

DES cryptosystem. They defined four security methods:

1) Method 0: encrypts all macroblocks from I-frames and the headers of all

prediction macroblocks.

2) Method 1: Encrypt all data associated with every nth I macro blocks

3) Method 2: Encrypt the same data as in method 1 and all header data of predicted

macroblocks.

4) Methods 3: Encrypt the same data as in method 1 and every nth predicted

macroblocks.

- 21 -

Selective Video Encryption Schemes

2.4 Problems of current video encryption scheme

There are three main problems in current selective encryption schemes.

A. Security Problem

A lot of cryptanalysis work has been done in proposed video encryption schemes [10],

41-45]. From the view points of these researches, the security of schemes which don’t use

standard cryptographic algorithms is very low. For example, Permutation is highly risky

shown in [10, 42-44]. Even using standard cryptographic algorithms such as DES or AES

in video encryption scheme, there are also many security problems existing. The

corresponding cryptanalysis can be found in [10, 41, 45].

Another crucial problem of selective encryption is that information can’t be totally

concealed after encryption. Some objects in video sequence still can be recognized from

the unencrypted part of video stream. As shown in figure 2.3 [63] [64], the original video

is encrypted by I-Frame/I-MB encryption, DCT coefficient encryption and Motion Vector

encryption respectively. After encryption, the video quality is greatly reduced. However,

the contents of this picture also can be recognized even after encryption. For some

secure-sensitive system, it will pose a great risk.

B. Computational Cost Problem

Some methods can provide substantial security. However, computational overhead and

data overhead become worse. For example, VEA scheme [18] is “very close to the

security of encryption scheme E that is internally used” [11]. However, it needs to

encrypt half of video data using internal encryption scheme E and transfer a large amount

of additional keys to receiver. The detailed computational cost of selective video

encryption schemes will be further analyzed in next Chapter.

- 22 -

Selective Video Encryption Schemes

Figure 2.3 Samples of Selective Encryption.

C. Feasibility Problem

Feasibility is another problem existed in many schemes. A lot of existing schemes are

so called “Integrated video compression and encryption system”. It means that the video

encryption module must be integrated into video compression system. For example,

permutation of AC, DC coefficients should be done before entropy coding. In this way,

the encryption should break the procedure of video compression, and the encryption

module must be integrated into video compression system. That is why the standard

decoder can’t work when decoding encrypted video data. The corresponding decoder to

this secure encoder should be “Integrated video decompression and decryption decoder”.

This causes such kind of scheme very hard to be widely used in commercial applications.

- 23 -

Selective Video Encryption Schemes

2.5 Conclusion

In this chapter, I introduced the fundamental of selective video encryption. A survey of

current video encryption schemes and the problem discussion are presented. From the

security point of view, the best way of protecting video data is full encryption algorithm,

which encrypts the entire video data by a standard cryptosystem. However, expensive

computational overhead makes it inefficient or impossible in lots of applications. As a

result, selective encryption targets encrypting only a part of video data in order to reduce

the computational cost, and keep the security level high. However, many proposed

schemes only achieved moderate to low security and only few of the proposed methods

promise to achieve substantial security. A high secure and low computational cost video

encryption scheme is absolutely necessary for future high definition video coded by

H.264/AVC.

- 24 -

Unequal Secure Encryption (USE) Scheme for H.264

3 Unequal Secure Encryption (USE) Scheme for H.264

As discussed in chapter 2, current video encryption schemes can’t balance the security

and the computational cost. Selective encryption schemes show their weak points in

security, and full encryption scheme requires too much computational cost. The original

idea for our USE scheme is: Video data can be selected, why not to select cryptosystems

for selected video data? If doing this, all of the video data can be encrypted, and we can

choose light cryptosystem with low computational cost to encrypt unimportant data set to

reduce total computational cost. As a result, both of the security problem and

computational cost problem can be solved. This chapter introduces our proposed USE video encryption scheme. In the last part,

the comparison with others’ schemes is discussed.

3.1 Introduction of H.264

H.264/AVC is the newest international video coding standard. It has been approved by

ITU-T as Recommendation H.264 and by ISO/IEC as International Standard 14496-10

(MPEG-4 part 10) Advanced Video Coding (AVC).

There are a lot of new techniques used in H.264/AVC, which include new coding

techniques, new data structure, new video storage and broadcast techniques. As the USE

scheme is applied after video coding, the details of H.264/AVC coding, storage and

transmission techniques needn’t to be considered very much. The H.264/AVC video data

structure has more impact on USE scheme. We need to do data classification by carefully

studying the data structure of H.264/AVC.

- 25 -

Unequal Secure Encryption (USE) Scheme for H.264

I slices

P slices

CAVLC

Slice Groups and ASO

Redundant Slices

B slices

Weighted Prediction

Interlace

CABAC

SP & SI Slice

Data Partitioning

Baseline Profile

Main Profile

Extended Profile

FMO

Figure 3.1 H.264 Baseline, Main and Extended Profiles.

Figure 3.2 H.264/AVC data format.

- 26 -

Unequal Secure Encryption (USE) Scheme for H.264

In H.264/AVC, profiles and levels specify conformance points. A profile defines a set

of coding tools or algorithms that can be used in generating a conforming bitstream,

whereas a level places constraints on certain parameters of the bitstream. The first version

of H.264/AVC defines a set of three profiles as shown in Figure 3.1 [46][47]. The Baseline profile supports all features in H.264/AVC except two feature sets: 1) B

slice, weighted prediction, CABAC, field coding and picture or macroblock adaptive

switching between frame and field coding, 2) SP/SI slices, and data partitioning. The first

set of additional features is supported by the Main profile. However Main profile doesn’t

support FMO, ASO, and redundant pictures features which are supported by Baseline

profile. The Extended profile supports all features of the Baseline profile, and both sets of

feature except for CABAC. Some new features which can be used in the USE scheme are listed below: Coded Data Format: H.264/AVC makes a distinction between a Video Coding Layer

(VCL) and Network Abstraction Layer (NAL). The output of the encoding process is

VCL data which are mapped to NAL units prior to transmission or storage. A coded

video sequence is represented by a sequence of NAL units. The data format of NAL is

shown in Figure 2. One NAL unit contains one or more slices, each slice contains an

integral number of macroblocks (MBs). Each MB contains a series of header elements

and coded residual data.

Parameter sets: H.264 introduces the concept of parameter sets, which provides for

robust and efficient conveyance header information. Parameter sets includes the key

information such as sequence header, picture header, this key information is separated for

handling in a more flexible and specialized manner in H.264/AVC. This new feature is

fully used in our USE scheme.

- 27 -

Unequal Secure Encryption (USE) Scheme for H.264

Flexible macroblock ordering (FMO): FMO is a new technique introduced by

H.264/AVC which has ability to partition the picture into regions called slice groups.

FMO can be used to enhance robustness to data losses in transmission. In the USE

scheme, we provide two kinds of usage of FMO in video encryption scheme. Data partitioning: As some coded information is more important than others for

purpose of representing the video content, H.264/AVC allows syntax of each slice to be

separated into three partitions. In the USE scheme, this data partition is used.

3.2 USE Scheme for H.264/AVC

The purpose of designing Unequal Secure Encryption scheme is to provide substantial

security with low computational cost for video encryption. As discussed in Chapter 2, a

lot of existing video encryption schemes target low computational cost while ignoring

security problems, many proposed schemes are so called “Integrated video compression

and encryption system” which is hard to be really used in a video security system. Some

proposed schemes can achieve high security level. However, the computational cost is

bad.

USE scheme is a full encryption scheme which encrypts the entire video data using

selective encryption methods. The target application of the USE scheme is H.264/AVC

based video security system. Especially, the USE scheme can be very efficiently used for

high definition video encryption, because the computational cost of USE scheme is much

lower than others’ while security is still very high. The contents of USE scheme can also

be found in [61] and [62].

- 28 -

Unequal Secure Encryption (USE) Scheme for H.264

Figure 3.3 Unequal Secure Encryption Scheme.

3.2.1 Data Partition Methods

The USE scheme is shown in Figure 3.3. It includes two major steps: The first step is

video data classification. The purpose of classification is to divide video data into two

partitions: important video data partition and unimportant video data partition. The

importance is evaluated by how difficult to reconstruct a picture. If the data in important

partition is lost, the total video content can’t be represented, while the data in unimportant

partition is lost, the video content can also be reconstructed just with quality reduction.

Therefore, the important video data group needs to be protected more securely than

unimportant one. As shown in this figure, after data classification, H.264/AVC video data

is parted into DPA (Data Partition A, important) and DPB (Data Partition B,

unimportant).

- 29 -

Unequal Secure Encryption (USE) Scheme for H.264

The second step in the USE scheme is unequal secure encryption. Unlike the existing

selective encryption scheme, the USE scheme encrypts entire video data, and different

cryptosystems are selected to encrypt different part of video data. As discussed in

Chapter 2, from the view points of cryptanalysis, the best way to keep security is to

encrypt the entire video data, and use the standard cryptography to do encryption other

than some other methods whose security can’t be approved. As shown in Figure 3.3, two

cryptographies are used in the USE scheme. DPA is encrypted by cipher A, and DPB is

encrypted by cipher B. Different algorithm has different security level and computational

cost. In the USE scheme, we use AES as cipher A, and FLEX as cipher B. FLEX is our

proposed algorithm which based on AES, the hardware implementations of AES can also

support FLEX, and the speed of FLEX is faster than AES. Besides AES and FLEX, some

other cryptographic algorithms also can be used in the USE scheme. The computational cost for USE scheme depends on data classification and

cryptographic algorithms. As the algorithms have been decided, the data classification

plays a more important role. There are three data classification methods in the USE

scheme. As the USE scheme is designed for H.264/AVC, two of these classification

methods use the new features of H.264/AVC.

3.2.2 Data Partition Methods

The purpose of data classification is to partition video data based on importance. There

isn’t standard definition of importance for video data. Normally, the difficulty to

reconstruct the picture caused by data loss is used to evaluate the importance of data. In

H.264, Header data (includes parameter sets and MVD) loss causes most difficult to

reconstruct the picture. VLC data (includes Intra and Inter residual data) loss causes

video quality reduction. Intra data is independent between each frame while Inter data is

dependent with neighboring frames, so the reconstruction of Intra loss is much more

- 30 -

Unequal Secure Encryption (USE) Scheme for H.264

difficult than Inter loss.

There are three data classification methods in the USE scheme. All of them are

performed after video encoding. The video coding scheme and video encryption scheme

are totally separated in our USE scheme.

Figure 3.4 Data Partition in H.264/AVC Extended Profile.

Data Partitioning (Extended Profile)

This is a new feature in H.264/AVC Extended Profile, which can do data partition

automatically. As shown in Figure 3.4, the coded data that makes up a slice is placed in

three separate Data Partitions (A, B and C). Partition A contains the slice header and

header data of MBs. Partition B contains intra coding MBs’ residual data, Partition C

contains inter coding MBs’ residual data. Obviously, the information in Partition A is

more important than B and C. Normally, intra data (Partition B) is considered more

important than inter data (Partition C).

- 31 -

Unequal Secure Encryption (USE) Scheme for H.264

FMO (Baseline Profile, Extended Profile)

FMO is a new feature in H.264/AVC. It has ability to partition the picture into regions

called slice groups. In H.264/AVC standard, FMO consists of seven different partition

types. All of these types make it easy to partition pictures. In the USE scheme, there are

two kinds of partition modes (shown in Figure 3.5). The first partition mode is Region

Based FMO. In this mode, the picture is partitioned into two slice groups: Secret regions

and Normal regions. The shape of secret regions can be decided by other pre-processing

tools such as object recognition and extraction. This mode can support extraction of any

interesting shapes in picture, so object based encryption can be realized. The second

partition mode is Mode Based FMO. In this mode, the picture is partitioned into two slice

groups: Intra MBs and Inter MBs. As Intra MBs is more important than Inter MBs to

reconstruct picture, the Intra MBs should use highly secure encryption algorithms.

Slice Group 0

Slice Group 1

Slice Group 1: Inter MBs

Slice Group 0: Secret Regions

Slice Group 1: Normal Regions

Slice Group 0: Intra MBs

a. Region based FMO b. Mode based FMO

Figure 3.5 Data Partition by FMO.

- 32 -

Unequal Secure Encryption (USE) Scheme for H.264

Figure 3.6 Data Partition by Parameters Extraction.

Parameters Extraction (All Profiles)

Since Data Partitioning method and FMO method are profile limited methods, a

common method which can be used in any profiles is needed. The Parameter Extraction

method which is shown in Figure 3.6 is such kind of method. The effect of this method is

like Data Partitioning method. The difference is that Data Partitioning method can be

automatically done by codec. And this method needs a parser to do data classification.

- 33 -

Unequal Secure Encryption (USE) Scheme for H.264

3.2.3 Security levels

According to the book “Image and video encryption” by Andreas Uhl [71], and other

video encryption survey papers [9-13], the security strength of video data depends on

how much important data are encrypted. The importance of video data is defined as how

difficult to reconstruct video while the data is lost.

Similar as the others video encryption schemes, there are 4 security levels defined in

the USE scheme (Shown in Table 3.1). The definitions are listed as following:

Table 3.1 Security levels in the USE scheme.

Secure Levels

Algorithm Video content Data Classification Methods

Level 0 AES Headers Parameters Extraction

FLEX Inter, Intra, MVD

Level 1 AES Headers, MVD Data Partitioning

Parameters Extraction FLEX Inter, Intra

Level 2 AES Headers, MVD, Intra Data Partitioning

Parameters Extraction

FMOFLEX Inter

Level 3 AES All -

Level x AES Secret Region FMO

FLEX Normal Region

- 34 -

Unequal Secure Encryption (USE) Scheme for H.264

Level 0: Headers are encrypted by AES, and the remained data are encrypted by

FLEX. In level 0, the computational cost is the lowest. The Parameters Extraction

method can be used in this level.

Level 1: Headers and MVDs (in H.264/AVC, MVD corresponds to motion vector)

are encrypted by AES, and the remained data are encrypted by FLEX. The Data

Partitioning method and Parameters Extraction method can be used in this level.

Level 2: Headers, MVD and Intra MBs are encrypted by AES, and Inter MBs are

encrypted by FLEX. All of three data classification methods can be used in this level.

Level 3: The entire video is encrypted by AES. Level 3 has the highest

computational cost and security.

Level x: This is an extra security levels for the USE scheme. Only FMO methods can

be used in this level. It can be used in object-based encryption applications.

3.2.4 Encryption Methods

A. AES Algorithm

Advanced Encryption Standard (AES), also known as Rijndael, is a block cipher adopted

as an encryption standard by the U.S. government in 2001. AES is the most popular

algorithm used in symmetric key cryptography. AES has a fixed block size of 128 bits and a

key size of 128, 192 or 256 bits. AES operates on a 4×4 array of bytes termed the State. For

encryption, it will implement a round function 10, 12, 14 times (depends on the key length).

The detailed introduction of AES will further discussed in next Chapter.

- 35 -

Unequal Secure Encryption (USE) Scheme for H.264

Figure 3.7 FLEX encryption algorithm.

Figure 3.8 Leak position in the even and odd rounds.

- 36 -

Unequal Secure Encryption (USE) Scheme for H.264

B. FLEX Algorithm

FLEX (which stands for Fast Leak EXtraction) is a cipher algorithm based on the

round transformation of AES. FLEX provides the same key agility and short message

block performance as AES while handling longer messages faster than AES. In addition,

it has the same hardware and software flexibility as AES, and hardware implementations

of FLEX can share resources with AES implementations. The FLEX algorithm is shown

in Figure 3.7.

Firstly, the given IV is encrypted by AES invocation: S=AESKey(IV). The 128-bit result

S together with encryption Key constitutes a 256-bit secret state of the stream cipher.

Secondly, we use result S as a new input data to AES: S’=AESKey(S). The cipher stream

will be generated as this process continues. The output of FLEX is not S or S’, it comes

from internal states of AES. As shown in Figure 3.8, 4×4 array of bytes constitutes the

internal state of AES. In every round function of AES, a part of AES States is output. In

FLEX algorithm, b0, 0, b0, 2, b1, 1, b1, 3, b2, 0, b2, 2, b3, 1, b3, 3 are output in odd rounds, b0, 1, b0,

3, b1, 1, b1, 3, b2, 1, b2, 3, b3, 1, b3, 3 are output in even rounds. It totally outputs 80 States of

AES (640 bits) in every AES encryption round. The speed of FLEX is exactly 5 times

faster than AES.

C. XOR Method

In order to further reduce computational cost, we use XOR method to reduce 50% of

computational cost. This method is shown in Figure 3.9. There are three steps of this

method:

Step 1: Divide total plaintext into two partitions A and B (with the same size),

Step 2: Encrypt partition A while XOR partition A with partition B bits by bits,

Step 3: Partition C and D are ciphertext.

- 37 -

Unequal Secure Encryption (USE) Scheme for H.264

Figure 3.9 XOR Method

By using XOR method, we can just encrypt half of video data to achieve low

computational cost. The security of total plaintext is equal to partition A.

3.3 Comparison

In order to compare the computational cost and encrypted data part of our proposed

USE scheme with others’ schemes, the percentage of each data set in coded video bit

stream should be firstly calculated.

Table 3.2 shows the experimental results for several H.264/AVC QCIF sequences. It

lists the header information size, MVD size, Intra MBs residue size and Inter MBs

residue size in 10 QCIF test sequences. In every test sequence, it begin with I frame,

followed by P or B frames. Totally 100 frames are included in each test sequence. From

these 10 sequences, the average ratios of data size for Header is about 20%, MVD is

about 20%, Intra residue is about 15%, and Inter residue is about 45%.

Table 3.3 shows the ratios of each data partition for different video sequences under

different security levels. In level 0, about 20% video data is encrypted by AES and 80%

- 38 -

Unequal Secure Encryption (USE) Scheme for H.264

video data is encrypted by FLEX. In level 1, the percentage is 40% and 60%, and level 2

is 55% and 45%. Level 3 is 100% encrypted by AES. Level x uses FMO data partition

methods, which is depends on user’s constraint. In this experiment, Level x is not

included. Table 3.4 shows the computational cost and encrypted data percentage comparison of

our USE scheme with other’s proposals. The comparison is under the experimental

results listed in Table 3.2. We use the average percentage of 10 sequences. The

computational cost is measured by n@AES. We consider that the full encryption by AES

is 100%@AES. For example, the computational cost for SECMPEG level 1 is

20%@AES. It means that the computational cost of SECMPEG level 1 is 20% of full

encryption. The encrypted data percentage reflects the security strength of each video

encryption schemes. As all of the schemes use AES to encrypt the selected important data,

the security can be evaluated by the amount of encrypted data. From Table 3.4, it can be seen that our scheme can achieve both high security and low

computational cost compared to others’ work. For example, the computational cost of

Level 0 in our USE scheme is just about 18% of naive encryption, and the encrypted data

percentage is 100%.

Figure 3.10 shows the comparison of security and computational complexity of our

proposed USE scheme with other schemes. The computational complexity is defined as

the how many percentage of full encryption by AES algorithm. The security is also

evaluated according to the AES full encryption. We considered the security of FLEX

algorithm is 1/5 of AES algorithm. From this figure, our proposed USE scheme is much

higher secure than others’, and also the computational complexity of our scheme is very

low.

- 39 -

Unequal Secure Encryption (USE) Scheme for H.264

3.4 Conclusion

In this chapter, an unequal secure encryption scheme for H.264/AVC is proposed. In

order to maintain high security, our scheme uses full encryption approach to encrypt the

whole video data by selective encryption methods. This scheme mainly includes two

parts: Data classification and Unequal secure encryption:

(1) Data classification: There are three classification methods in the USE scheme:

Data partitioning for extended profile, FMO for main and baseline profile, and

parameters extraction for all profiles. After data classification, the entire video data are

divided into two partitions: the important data partition and unimportant data partition.

(2) Unequal secure encryption: This method can also be called as selective

cryptosystem method. As different cryptosystems have different security level and

different computational cost. In the USE scheme, we choose AES to encrypt the

important data partition and propose a light encryption algorithm called FLEX to encrypt

the unimportant data partition. The speed of FLEX is 5 times faster than AES. However,

the security is less than AES because it leaks many internal states when doing encryption.

The experimental results and comparison show that our scheme can achieve both high

security and low computational cost. For level 0 of USE scheme, the computational cost

is only 18% of full encryption. And for highest security level 3, the computational cost is

only 50% of full encryption. It is very suitable to be used in high security and high

definition video encryption systems.

- 40 -

Unequal Secure Encryption (USE) Scheme for H.264

Table 3.2 Video data partition size

(QCIF@100 Frames, I Frame followed by P or B Frames).

Video

Sequence

Header (Includes MVD) Intra MBs Residue Inter MBs Residue Total size

of

compressed

H.264 File

(bits) Header

(bits)

Header/Total

(%)

MVD

(bits)

MVD/Total

(%)

VLC

(bits)

VLC/Total

(%)

VLC

(bits)

VLC/Total

(%)

Canoa 676577 26.04% 300816 11.58% 769777 29.62% 1152357 44.34% 2608088

CarPhone 314675 51.84% 150868 24.85% 55551 9.15% 236802 39.01% 616672

Claire 95326 57.69% 38300 23.18% 10801 6.54% 59111 35.77% 175640

Container 96239 46.49% 32468 15.68% 23877 11.53% 86899 41.98% 217832

Football 825441 30.14% 390128 14.25% 866291 31.64% 1046531 38.22% 2747592

Foreman 375985 55.99% 195606 29.13% 43971 6.55% 251588 37.46% 680648

Grandma 99382 52.85% 39218 20.86% 17903 9.52% 70763 37.63% 198600

Mobile 454322 36.29% 207090 16.54% 54242 4.33% 743504 59.38% 1261768

News 183186 41.21% 86012 19.35% 55332 12.45% 206017 46.34% 454736

Table 312751 39.18% 165196 21.03% 78360 9.98% 394422 50.21% 795512

- 41 -

Unequal Secure Encryption (USE) Scheme for H.264

Table 3.3 Video data partition for different security levels.

Video

Sequence

Level 0 Level 1 Level 2

AES FLEX AES FLEX AES FLEX

Canoa 14.41% 85.59% 26.04% 73.96% 55.66% 44.34%

CarPhone 26.56% 73.44% 51.84% 48.16% 60.99% 39.01%

Claire 32.47% 67.53% 57.69% 42.31% 64.23% 35.77%

Container 29.28% 70.72% 46.49% 53.51% 58.02% 41.98%

Football 15.84% 84.16% 30.14% 69.86% 61.78% 38.22%

Foreman 26.50% 73.50% 55.99% 44.01% 62.54% 37.46%

Grandma 30.29% 69.71% 52.85% 47.15% 62.37% 37.63%

Mobile 19.59% 80.41% 36.29% 63.71% 40.62% 59.38%

News 21.37% 78.63% 41.21% 58.79% 53.66% 46.34%

Table 18.55% 81.45% 39.18% 60.82% 49.1% 50.84%

- 42 -

Unequal Secure Encryption (USE) Scheme for H.264

Table 3.4 Comparison with other video encryption schemes.

Encryption Schemes Content to be encrypted Computational cost

( @ AES )

Encrypted Data

SEC MPEG [14]

Level 1

Header 20% @ AES 20%

Level 3

Header and Intra 35% @ AES 35%

Level 4

All 100% @ AES 100%

Aegis [15][16] Header, I frame 35% @ AES 35%

VEA [18] All 50% @ AES 100%

RVEA [19][20] Sign Bit of DCT and motion vectors

10% @ AES 10%

Alattar [21]

Method 0 Header, Intra and MVD 55% @ AES 55%

Method 1 Every nth I MB 1/n*15%@AES 1/n*15%

Method 2 + Header (1/n*15 + 40)% @ AES (1/n*15 + 40)%

Method 3 + nth Header (1/n*15 +1/n*40)%@ AES (1/n*15 +1/n*40)%

Ours

Level 0 All 18% @ AES 100%

Level 1 All 26% @ AES 100%

Level 2 All 32% @ AES 100%

Level 3 All 50% @ AES 100%

- 43 -

Unequal Secure Encryption (USE) Scheme for H.264

Secu

rity

(% F

ull E

ncry

ptio

n by

AES

, 1 F

LEX

=1/5

AES

)

Figure 3.10 Comparison of security and computational complexity.

- 44 -

Hardware Design of AES

4 Hardware Design of AES

4.1 Introduction of AES Algorithm

AES [37], also known as Rijndael, is the most popular algorithm used in symmetric

key cryptography. AES operates on a 4×4 array of bytes termed the State. For encryption,

it implements a round function 10, 12, 14 times (depends on the key length). The

encryption and decryption flow of AES algorithm are shown in Figure 4.1 (a) and (b).

Four transformations including Subbytes, ShiftRows, MixColumns and Addroundkey are

performed in the encryption process, and the other four inverse transformations are

performed in the decryption process. A separate KeyExpansion unit is used to generate

keys for each round of AES algorithm. In order to reduce the hardware cost, we propose a

hybrid dataflow for both of encryption and decryption, which is shown in Figure 4.1 (c).

This data flow supports both of encryption and decryption. All of function modules

support forward and inverse operation. KeyExpansion module also supports generating

forward and inverse key sequence. Compared to solution uses two AES cores (one for

encryption and another for decryption), the hybrid solution saves about 40% hardware

cost (For single modules, (Inv)Subbytes saves 50%, (Inv)MixColumns saves 30%,

(Inv)ShiftRows saves 20% of hardware cost). Figure 4.2 shows the operations in AES algorithm. The briefly introduction is listed as

below:

1) SubBytes: The SubBytes operation is a non- linear byte substitution that operates on

each byte of the State using a substitution table.

2) ShiftRows: In the ShiftRows operation, the bytes in the last three rows of the State

are cyclically shifted over different numbers of bytes.

3) MixColumns: Mixing operation which operates on the columns of the State using a

linear transformation.

4) AddRoundKey: A Round Key is added to the State by a simple bitwise XOR

operation.

- 45 -

Hardware Design of AES

Figure 4.1 Dataflow. (a) Encryption. (b) Decryption.

(c) Proposed hybrid dataflow for encryption & decryption.

Figure 4.2 Transformations in AES algorithm.

- 46 -

Hardware Design of AES

4.2 Existing low-cost implementations of AES

Many hardware implementations of AES algorithm already have been proposed. They

can be classified into two types: high speed designs and low-cost designs. Most of the

existing designs are high-speed design. However, with the increase of personal security

requirement and portal commercial electronic device usage, the low power and low-cost

design becomes very important. The existing low cost implementations of AES can be classified into 32-bit design and

8-bit design two types. While using 32-bit width data path, it is called 32-bit design, and

8-bit width data path is called 8-bit design. The briefly introduction of these two kinds of

designs are listed as following.

32-bit implementation of AES Algorithm [48]

The 32-bit implementation of AES algorithm is shown in Figure 4.3. It is proposed by

Satoh in [48]. This architecture uses 32-bit width data path. The key function modules of

this architecture include:

S-Box module (32-bit): Every S-Box supports one SubBytes (8-bit) operation, so 32-bit

SubBytes operation is realized by using 4 S-Boxes in the data path. These S-Box modules

can be used for both of AES Round function and Key expansion operation.

MixColumns module (32-bit): MixColumns module can support both of MixColumns

and Inverse MixColumns operation through reusing part of hardware.

In this architecture, S-Box module, MixColumns module and AddRoundKey module

(Xor module) are serially connected. This architecture can be also called as serial data

path design. The hardware cost of this implementation is shown in Table 4.1. The ASIC

implementation shows that the frequency is 130 MHz and the throughput is 311 Mbps.

- 47 -

Hardware Design of AES

M

Schedule

+

32

32

32

32

32

Data

Key

32

S-Box

S-Box

S-Box

S-Box

32

Figure 4.3 32-bit architecture for AES.

Table 4.1 Hardware cost of 32-bit AES @ 131 MHz, 0.11um, [48].

Components Gates %

Data Register 864 16.01%

ShiftRows 160 2.96%

S-Boxes 1,176 21.79%

MixColumns 350 6.48%

AddRoundKey 56 1.04%

Key Expander 1,896 35.12%

Others 699 12.95%

Total 5398 100%

- 48 -

Hardware Design of AES

8-bit implementation of AES Algorithm [50]

The 8-bit implementation of AES algorithm is shown in Figure 4.4. It is proposed by

Feldhofer in [50]. This architecture uses 8-bit width data path, and it is designed to be

used in RFID tag. The key function modules are listed as following:

S-Box module (8-bit): There is only 1 S-Box in this architecture. It is also used to

support both of Round function and Key expansion.

1/4 MixColumns module (8-bit): Since Mixcolumns operation is a 32-bit operation, the

authors proposed a new method to implement it into 8-bit. Additional registers and clock

cycles are needed.

In this architecture, S-Box module, MixColumns module and AddRoundKey module

are parallelized. This architecture is also referred as parallel data path design. The

hardware cost of this implementation is shown in Table 4.2. The ASIC implementation of

this design shows that the working frequency is about 100 KHz, and the throughput is

about 12.6 kbps, which is enough for RFID communication.

The 32-bit and 8-bit implementation of AES achieves very low hardware cost, and they

are suitable for low cost, low throughput system. However, the architectures in these two

implementations are dedicated to specific applications, and can’t be scalable to a higher

performance. For video encryption, the performance requirement is quite different for

different video resolution and frame rate. For example, for oneseg [51] (Mobile TV used

in Japan), the throughput of video data is 160 kbps, however, for HDTV DVD, the

throughput of video data can reach to 50 Mbps. In this way, a scalable architecture which

can provide different performance is the best choice for AES hardware design.

- 49 -

Hardware Design of AES

S-B

oxControler

S-Box

¼ M

ixColum

ns

¼ M

ixColum

ns

Figure 4.4 8-bit architecture for AES.

Table 4.2 Hardware cost of 8-bit AES @ 100 KHz, 0.35um, [50].

Components Gates %

S-Boxes 395 10.0%

MixColumns 252 7.0%

AddRoundKey 90 2.5%

Key Expander 161 4.5%

RAM 2,337 65%

Controller 360 10.0%

Total 3,595 100%

- 50 -

Hardware Design of AES

4.3 Proposed Scalable Hardware Architecture for AES

Scalable architecture is very important for IP design. Different implementations with

different performance and hardware cost according to specific requirement can be

designed based on a common architecture provides great flexibility and reliability for IP

design and also system integration. For multimedia system, various specifications are

required for different applications. For example, mobile TV usually uses CIF size video

with less than 1 Mbps bit rate. In contrast, for HDTV broadcasting, the bit rate is more

than 20 Mbps, and for future super HDTV the bit rate will be increased to hundreds of

Mbps. A scalable architecture, which can be used for wide specifications, is urgent for

AES IP design.

4.3.1 Top Level Architecture

The top level of proposed scalable hardware architecture for AES is shown in Figure 4.5

There are many blocks included in this architecture:

Data Registers

The data registers includes 16 bytes of registers, same as the block length (128-bit) of

plaintext of AES. Every subblocks (Gray color) represents a byte of data, which termed a

State. Before encryption, the data registers load plaintext from external memory, and after

encryption, the data register output the ciphertext.

Key Expander Module

Key expander module is used to generator key for each round of AES. It includes two

part: Key registers and Key scheduler. Key registers can be 16, 24, 32 bytes of register

for 128-bit, 192-bit and 256-bit key length. Currently, 128 bits key length is enough for

high security applications. And most of cryptosystem uses 128-bit key length. Key

Scheduler is a xor gate array with a control unit to generate round keys for encryption.

- 51 -

Hardware Design of AES

Figure 4.5 Scalable Hardware Architecture for AES

- 52 -

Hardware Design of AES

ShiftRows Module

Shiftrows module is very easy to implement. It is a wire mapping box to map the right

output ports to input ports. The input and output of ShiftRows module is 128-bit.

MixColumns Array

MixColumns Array is a set of MixColumns modules. The module number can be

implemented from one to four, since the Mixcolumns is 32-bit operation, at most four

modules can be used in an implementation. The detailed structure of Mixcolumns will be

discussed in Chapter 4.3.3.

S-Box Array

S-Box Array is a set of S-Box modules. S-Box is used to do subbytes operation of AES

algorithm, and it is very important for hardware implementation of AES. The subbytes

operation is the main computation in AES. The number of S-Box used in hardware

design greatly affects the performance and the hardware cost. The S-Box’s structure will

be particularly discussed in Chapter 4.3.3.

The scalability of this design is achieved by the following new ideas:

Independent Data path for Each Operations

There are three main data paths in our design: MixColumns datapath, SubBytes

datapath and AddRoundKey data path. The advantages of this design include:

1) Scalability

As shown in Table 4.3, the operations in AES have different bit width: SubBytes is an

8-bit operation. MixColumns is a 32-bit operation. AddRoundKey and ShiftRows are

128-bit operation.

A main problem of previous architecture for scalable design is that they integrate

- 53 -

Hardware Design of AES

different operations in one data path. As a result, all of the operations in the same data

path should use the same bit width. In other words, the number of processing elements

for each operation is highly correlated.

However, in our proposed design, we separate each operation into different data paths.

It makes our design very flexible for increase or decrease parallelism of each single

operation. For example, in our proposed architecture, the number of MixColumns module

and number of S-Box modules can be freely configured, without to consider about other

operations.

Table 4.3 Bit width of operations in AES algorithm.

2) Power & Performance Improvement

Since the operations are distributed in several parallel data paths, the critical path of

hardware implementation becomes much shorter than serial architecture. The maximum

working frequency can be improved to a higher level. For low power design, because the

Operations Bit width for operations

SubBytes 8-bit

ShiftRows 128-bit

MixColumns 32-bit

AddRoundKey 128-bit

SubBytes for KeyExpansion 8-bit

- 54 -

Hardware Design of AES

critical path becomes shorter, it can use lower power supply and higher threshold voltage

to reduce power consumption.

Scalable S-Box Array, MixColumns Array

As shown in Figure 4.5, the S-Box array and MixColumns array are scalable modules,

which support different number of processing elements. Each S-Box is used to do

Subbyte operation with 8-bit inputs/outputs. The data registers is 128-bit, which should

do 16 times Subbytes in each round of AES algorithm. For key registers, four times

Subbytes is needed in each round of AES. Totally, in one round of AES algorithm, it

executes 20 times Subbytes operation. In our architecture, S-Box array supports 1-20

S-Boxes, which covers a wide range of performance and hardware cost requirement. For MixColumns array, it’s a 32-bit operation and for each round of AES algorithm,

four MixColumns operations are executed. Correspondingly, 1-4 MixColumns modules

can be used in this architecture. The advantage of Scalable S-Box Array and MicColumns Array is that the

performance and hardware cost can be balanced by simply adjusting the number of

processing elements. The scalability of AES hardware is greatly improved by these two

scalable modules.

4.3.2 Two typical subclass architectures

Based on the proposed scalable architecture, there are two typical subclass

architectures: Shared S-Box Architecture and Unified S-Box Architecture. Table 4.4

compares these two architectures.

- 55 -

Hardware Design of AES

S-B

ox

S-B

ox

S-B

ox

S-B

ox

S-B

ox

M M M

Figure 4.6 Shared S-Box Architecture.

Shared S-Box Architecture

As shown in Figure 4.6 shared S-box Architecture uses one unique S-Box Array for

both of data registers and key registers. The advantages of this architecture is that all of

the S-Boxes can be used for both of data Subbyte and key Subbyte. However, these two

Subbytes operation can’t be executed in the same time. An extra clock cycle is needed for

Subbyte opearion. Thus, this architecture is suitable for low hardware cost

implementations.

- 56 -

Hardware Design of AES

Figure 4.7 Unified S-Box Architecture

Unified S-Box Architecture

Unified S-Box array separate the S-Boxed into two parts: Data S-Box array and Key

S-Box array, as shown in Figure 4.7. Since key Subbyte executes four times in each round

of AES, four S-Boxes can be used. For data S-Box array, the number of S-box can be

1-16. The advantage of this architecture is that key operation is totally separated from

data operation. It needs less clock cycles than shared architecture. And it is very suitable

for high performance implementations. However, the utilization of key S-Box is not

higher, because the key Subbyte executes much less than data Subbyte.

- 57 -

Hardware Design of AES

Table 4.4 Comparison of two architectures.

Shared S-Box Arch. Unified S-Box Arch.

Configurable S-Box’s

number 1 to 20

Data part 1-16

Key part 1-4

Configurable

MixColumn’s number 1 to 4 1 to 4

Advantages

S-Box utilization is high

Save hardware cost

Separate Key and Data operation

Easy to control

Orientation Low-cost AES High-performance AES

4.3.3 Sub-Modules’ Design

As shown in Figure 4.5, there are three main sub-modules in the architecture: ShiftRows

module, S-Box module, and MixColumns module. ShiftRows is very simple and easy to

implement. It is just some shifting operations which can be easily implemented by hard

wire. The detailed implementation of S-Box and MixColumns are presented as follows:

S-Box Design

The S-Box design of this paper is referred from Canright’s work in [49]. The

architecture of S-Box is shown in Figure 4.8. This S-Box design uses normal basis to

optimize the GF(8) inverter to GF(((22)2)2) inverter. The isomorphism operation δ and

affine transformation are matrix operation which is easy to be implemented in hardware.

According to the experimental results of Canright’s paper, this design achieves least

hardware cost compared to other’s work.

- 58 -

Hardware Design of AES

affine

affine-1

δ

δ-1

δ-1

δ

StateIn 8 8

StateOut

γ1

γ0

ν γ2

γ-1γ1

γ0

4

4

4

4

g1

g0

d1

d0

Γ1

Γ0

Ν Γ2

Γ-1 Δ1

Δ0

2

2

2

2

1

1

GF(((22)2)2)Inversion

Galois Field Addition

Galois Field Multilication

Figure 4.8 S-Box structure.

a) Factors of Inverse MixColumns b) Dual-function module

Figure 4.9 MixColumns structure.

MixColumns Additional Matrix

Column of States

MixColumns InverseMixColumns

0 0 0 09 02 03 01 0109 0 0 0 01 02 03 010 09 0 0 01 01 02 030 0 09 0 03 01 01 02

08 08 08 08 04 00 04 0008 08 08 08 00 04 00 0408 08 08 08 04 0008 08 08 08

InverseMixColumns MixColumns

E B DE B D

D E BB D E

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

⎡ ⎤⎢ ⎥⎢ ⎥+ +⎢ ⎥⎢ ⎥⎣ ⎦

04 0000 04 00 04

AdditionalMatrix

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

- 59 -

Hardware Design of AES

MixColumns Design

There are a lot of hardware reuse methods for MixColumns module. We referred

Satoh’s work in [48]. Figure 4.9 a) shows the reused method of this module. In this

equation, the InversionMixColumns is separated into one MixColumns with two

additional matrixs. Figure 4.9 b) shows the hardware architecture of Dual-function

MixColumns module.

4.4 Performance Analysis

4.4.1 Scalability

Our proposed scalable architecture provides greatest scalability. There are totally 336

possible implementations based on proposed architecture. As shown in Table 4.5, for

Shared S-Box architecture, there are 1-20 S-Boxes and 1-4 MixColumns can be used.

Totally, 80 possible implementations are achieved based in this architecture. For unified

S-Box architecture, 1-16 S-Boxes for data, 1-4 S-Boxes for key, and 1-4 MixColumns

can be used. Totally, it has 256 possibilities. Taking count of two architectures, there are

336 possible implementations of AES base on the proposed salable architecture.

Table 4.5 Possible implementations of AES based on scalable architecture.

Shared S-Box Arch. Unified S-Box Arch.

S-Box for Data 1- 20

1-16

S-Box for Key 1-4

MixColumns module 1 - 4 1-4

Possible Configurations 80 256

Total 336

- 60 -

Hardware Design of AES

ksAR

KS MSAR

First Round

Roundi

LastRound

S A

ks

A

R

M

S

Key Subbytes

AddRoundKey

ShiftRows

MixColumns

SubBytes

ksAR

ks, S, M AR

First Round

Roundi

LastRound

S A

a) Dataflow of Unified S-Box Architecture b) Dataflow of Shared S-Box Architecture

Figure 4.10 Dataflows for scalable architecture.

4.4.2 Dataflows

Figure 4.10 shows the dataflows for proposed scalable architecture. The dataflow

includes three parts: First round, Round i and Final round. First round includes two

sub-procedures: {A, S}, and {ks}. The meaning of notations is listed in the table in this

figure. Especially, A and S are executed in the same clock cycle. Round i is a loop

function of AES. For AES 128-bit key, the number of loops is 9. In this step, for unified

S-Box architecture, the dataflow includes two sub-procedures: {A, S, ks} and {A, R}. For

shared S-Box architecture, the dataflow includes three sub-procedures: {S}, {ks, M} and

{A, R}. All of the operations within same sub-procedures are executed in parallel. Final

round is the last round of AES, and it includes two sub-procedures: {S}, {A}. For an

AES-128 encryption, the total needed clock cycles for these two architectures are,

- 61 -

Hardware Design of AES

Clock cycles for shared S-Box Architecture

_ _ _9

4 1

4 4 16max , 1 9

16 1

4 4 4 16 16max , 1 9 2 (4.1)

clk first round round i final roundT T T T

x

x y x

x

x x y x x

= + × +

⎡ ⎤= +⎢ ⎥⎢ ⎥⎛ ⎞⎛ ⎞⎡ ⎤⎡ ⎤ ⎡ ⎤+ + + ×⎜ ⎟⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥⎜ ⎟⎢ ⎥ ⎢ ⎥⎢ ⎥⎝ ⎠⎝ ⎠⎡ ⎤+ +⎢ ⎥⎢ ⎥

⎛ ⎞⎛ ⎞⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤= + + + × + +⎜ ⎟⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎜ ⎟⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥⎝ ⎠⎝ ⎠

Clock cycles for unified S-Box Architecture

_ _ _9

4 1

16 4 4max , , 1 9

16 1

4 16 4 4 16max , , 1 9 2 (4.2)

clk first round round i final roundT T T T

x

x y z

x

x x y z x

= + × +

⎡ ⎤= +⎢ ⎥⎢ ⎥⎛ ⎞⎛ ⎞⎡ ⎤⎡ ⎤ ⎡ ⎤+ + ×⎜ ⎟⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥⎜ ⎟⎢ ⎥ ⎢ ⎥⎢ ⎥⎝ ⎠⎝ ⎠⎡ ⎤+ +⎢ ⎥⎢ ⎥

⎛ ⎞⎛ ⎞⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤= + + × + +⎜ ⎟⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎜ ⎟⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥⎝ ⎠⎝ ⎠

4.4.3 Hardware Implementation

In order to reduce hardware cost, we use only one S-box and one MixColumns module

and also constrained the design by loose frequency to get the lowest hardware cost in

synthesis tool. The synthesis results are shown in Table 4.6 Here we use TSMC 0.18 um

standard cell library, and use Synopsys Design Compiler to do synthesis. In order to

measure the highest performance of proposed architecture, we use 20 S-boxes and 4

- 62 -

Hardware Design of AES

MixColumns modules and constrained the design by maximum frequency. The synthesis

results are shown in Table 4.7. The results show that the lowest hardware cost for each component of AES hardware

can be achieved below the minimum frequency of 123 MHz. In other words, the

hardware cost will be increased very much while the constrained frequency is set above

123 MHz. Because of parallel data path to shorten critical path, the maximum frequency

of proposed architecture can achieve 416 Mbps. Besides performance increasing, the

hardware cost also increased very much. The detailed hardware cost is listed in Table 4.7.

Table 4.6 Hardware cost of lowest cost AES @ 123 MHz, 0.18 um.

Components Gates

Data Registers 1079

ShiftRows + AddRoundKey 307

S-Box 358

MixColumns/InvMixcolumns 376

Key Expander + Key Registers 1935

Controller 247

Others 318

Total 4620

Table 4.7 Hardware cost of highest performance AES @ 416 MHz, 0.18 um.

Components Gates

Data Registers 1079

ShiftRows + AddRoundKey 310

S-Box 1138 * 20

MixColumns/InvMixcolumns 349 * 4

Key Expander + Key Registers 2548

Controller 264

Others 1112

Total 29469

- 63 -

Hardware Design of AES

Table 4.8 Scalability of hardware implementations.

Components Lowest hardware cost

implementation

Highest performance

implementation

Technology TSMC 0.18 mμ TSMC 0.18 mμ

Frequency 123 MHz 416 MHz

Hardware cost 1 S-Box, 1 MixColumns 20 S-Boxes, 4 MixColumns

Required clock cycles for AES

encryption 211 22

Throughput 75Mbps 2.4 Gbps

Table 4.9 Comparison with others’ architecture.

*32-bit

Architecture

[48]

8-Bit

Architecture

[50]

Proposed Scalable Architecture

An implementation

example 1

An implementation

example 2

Hardware cost

Gate

4 S-Box,

1 MixColumns

7226

1 S-Box,

1/4 MixColumns

3595

5 S-Box

1 MixColumns

7344

4 S-Box,

1 MixColumns

6986

Frequency 138 MHz

@ 0.18 um

100 KHz

@ 0.35 um

180 MHz

@ 0.18 um

180 MHz

@ 0.18 um

clock cycles for

AES encryption 54 >1000 54 64

Throughput 327 Mbps 12.6 kbps 427 Mbps 360 Mbps

Power

Consumption 18.3 mW 8.5 uW 22.3 mW 20 mW

Scalable NO NO YES

*AES design using Satoh’s architecture, and implemented by us.

- 64 -

Hardware Design of AES

Thr

ough

put

Figure 4.11 Comparison with others’ AES design.

Table 4.8 shows the scalability of proposed hardware design of AES. While implement

the AES to lowest hardware cost, it only use 1 S-Box and 1 MixColumns, and the

throughput is 75Mbps. However, for highest performance implementation, there are 16

S-Boxes and 4 Mixcolumns are used. The throughput can achieve 2.4 Gbps.

Table 4.9 compares the proposed architecture with 32-bit architecture and 8-bit

architecture. In order to equally compare with Satoh’s work, two example

implementations with the similar hardware cost is designed [66]. Figure 4.11 shows this

comparison more clear. It can be seen that our scalable architecture provides great

scalability in both of performance and hardware cost. In the same level hardware cost,

our design achieves better performance.

- 65 -

Hardware Design of AES

4.5 Conclusion

This chapter presents a scalable architecture for AES hardware design. This

architecture provides high scalability for designing different AES implementations with

various hardware cost and performance. Two kinds of subclass architectures are

introduced: Unified S-Box architecture and Share S-Box architecture. The comparison

and advantages of these two architectures are discussed. The sub-modules’ design and

implementations are also presented. Finally, the performance analysis and hardware cost

gained from experiments are discussed. The experimental results show that the

throughput of lowest cost AES implementation which uses 1 S-Box and 1 MixColumn is

75 Mbps, while the highest cost AES with 20 S-Box and 4 MixColumn can be 2.4 Gbps.

Compared to the other AES architectures which are not scalable, our design provides

high flexibility for various performance requirement, and it is much suitable for video

encryption systems.

- 66 -

DPA Attack on AES

5 DPA Attack on AES

Cryptographic devices are widely used in many places, such as smart card, USB key,

and so on. These devices are used to provide authentication of users or store secret

information. The security problem of these devices is a very hot topic. Many researchers

engage in this field, they proposed a lot of attack methods to crack cryptographic devices.

Correspondingly, many countermeasure methods are proposed to prevent attacking. In recent years, side-channel attack becomes very popular, because side-channel attack

uses different approach to achieve its goal. Traditional attack methods use mathematical

analysis to reveal the weak point of cryptographic algorithm. The researcher should be

very experienced in crypto-analysis and own deep knowledge about the cryptographic

algorithms. However, side-channel attack only uses the side-channel information, such as

power consumption, time consumption, to decipher the secret information in

cryptographic devices. It is not necessary for attacker to hold crypto-analysis experience

or cryptographic algorithm knowledge. Among the proposed side-channel attack methods,

DPA attack was proved to be most efficient and easy to implement. In this chapter, we briefly introduce the DPA attack method, especially for DPA attack

on AES. Some DPA attack countermeasure methods are also discussed in the end of this

chapter.

5.1 Introduction of Differential Power Analysis attack

In recent years, several kinds of attacks on cryptographic devices have become public.

The goal of these attacks is to reveal secret keys of cryptographic devices. Attacks on

cryptographic devices differ significantly in terms of cost, time, equipment, and expertise

needed. When Kocher et al. [52] showed in 1998 that power analysis attacks can

- 67 -

DPA Attack on AES

efficiently reveal the secrets of cryptographic devices, the world was shocked. After then,

power analysis attacks received most amount of attention because they are very powerful

and because they can be conducted relatively easily. Consequently, they pose a serious

threat to the security of cryptographic devices in practice. For the design and

development of modern cryptographic devices, it is crucial to power analysis attacks and

countermeasures. The basic idea of power analysis attack is to reveal the key by analyzing cryptographic

device’s power consumption. It exploits the fact that the power consumption of a

cryptographic device depends on the data it processes and the operation it performs.

Attacker make use of power consumption with some other mathematical methods, the

secret key can be cracked. Basically, power analysis attack can be classified into two categories: Simple power

analysis and Differential power analysis. Simple power analysis (SPA) attacks are

characterized by Kocher et al. in the following way: “SPA is a technique that involve

directly interpreting power consumption measurements collected during cryptographic

operations”, which means that attacker tries to derive the key more or less directly from a

given trace. In contrast to SPA, Differential Power Analysis (DPA) attacks requires a

large number of power traces, and it exploit the data dependency of the power

consumption of cryptographic device. DPA attack exploits the fact that the power consumption of cryptographic devices

depends on intermediate values that are processed during the execution of a

cryptographic algorithm. DPA attacks are the most popular type of power analysis attack

due to the fact that DPA attacks do not require detailed knowledge about the attacked

device. Therefore, they can reveal the secret key of a device even if the recorded power

traces are noisy.

- 68 -

DPA Attack on AES

5.1.1 Power Consumption of CMOS Circuit

The total power consumption of a circuit depends two parts: the sum of logic cells

making up this circuit, and the activities of each logic cell. The logic cells can be

considered from system level, architecture level, to final MOS transistor level. Currently,

logic cells are usually implemented using CMOS. We use CMOS invert to describe the

power consumption, because the inverter is representative for all other cells.

As shown in Figure 5.1, the inverter includes consists of two transistors P1 and N1,

and a load capacitance CL. The power consumption includes two parts: Statistic power

consumption and dynamic power consumption.

P1

N1 CL

VDD

GND

a q

Figure 5.1 CMOS Inverter.

- 69 -

DPA Attack on AES

Statistic Power Consumption

In this CMOS convert, P1 is conducting and N1 is insulating if the input a is set to

GND. Vice versa, P1 is insulating and N1 is conducting if the input a is set to VDD. In

both of cases, there is no direct connection between the VDD and GND. Therefore, only

a small leakage current is flowing through the MOS transistor. This leakage is denoted by

Ileak. The static power consumption Ps can be calculated by the following equation:

s leakP I VDD= × (5.1)

Dynamic Power Consumption

Dynamic power consumption occurs for logic cell switching. For a logic cell, four

transitions can be essentially performed: 0->0, 1->1, 0->1 and 1->0. For the first two

cases (0->0, 1->1), only static power is consumed. For the last two cases (1->0, 0->1), the

dynamic power is consumed. Table 5.1 illustrates those power consumptions for each

transition.

Table 5.1 Power consumption of four transitions in a circuit.

Transitions Power consumption

0 -> 0 Static power consumption

0 -> 1 Static + Dynamic power consumption

1 -> 0 Static + Dynamic power consumption

1 -> 1 Static power consumption

- 70 -

DPA Attack on AES

The dynamic power consumption Pd consists of two parts: Charging power

consumption and Short-circuit power consumption.

1) Charge Power Consumption

CMOS inverter draw a charging current from the power supply to charge the output

capacitance CL when output q switching from 0 to 1. CL is internal capacitances that

connected to the output port q, and it depends on the physical properties of process

technology, fanout cells, and the length of connected wires.

The average charging power Pch is consumed by a cell during the time T, as shown in

equation 5.2

2

0

1 ( )T

ch ch LP p t dt f C VDDT

α= =∫ i i i (5.2)

In this equation, pch(t) denotes the instantaneous charging power, f is the clock

frequency, α is activity factor of the cell which corresponds to the average number of

0->1 transitions that occur at the output of a cell in every clock cycle.

2) Short-Circuit Power Consumption

Short circuit happened temporary in CMOS circuit during the switching of the output.

In the case of CMOS inverter, there is a short period of time where both of P1 and N1 are

conducting simultaneously. The average power consumption of Psc that caused by

short-circuit can be calculated in equation 5.3:

0

1 ( )T

sc sc peak scP p t dt f VDD I tT

α= =∫ i i i i (5.3)

In this equation, psc(t) denotes the instantaneous short-circuit power consumed by a

cell. Ipeak is the current peak caused by the short circuit during switching event. tsc is the

time of short circuit exists.

- 71 -

DPA Attack on AES

5.1.2 Power Model

Power analysis attack is cryptographic attack that uses power consumption information

and hypothetical power model to reveal secret keys in cryptographic devices. The power

consumption can be easily got when cryptographic devices are running. The most

important thing for power analysis attack is to build an accurate hypothetical power

model. Different from the conventional power model used in other applications, the

absolute values of power consumption are not relevant in power analysis attack. Only

relative differences between simulated power consumption values are important. In this

way, researchers only make use of dynamic power to model circuit power. Hamming weight (HW) model and hamming distance (HD) model are two important

hypothetical power models for power analysis attack. Hamming weight model is a simple

power model, which assumes that the power consumption is proportional to the number

of bits that are set in the processed data value. The data values that are processed before

and after this value are ignored. Therefore, this power model is not very well suited to

describe the power consumption of a CMOS circuit. The equation 5.4 shows the power

consumption based on HW model.

( )HWP HW S nλ= +i (5.4)

λ is a constant to model the ratios of noise power and circuit power. HW(S) is the

Hamming weight of internal state S. n is the noise power.

Hamming distance model is more accurate than hamming weight model. The basic

idea of the hamming distance model is to count the number of 1→0 and 0→1 transitions

that occur in a digital circuit during a certain time interval. This number is then used to

describe the power consumption of the circuit in this time interval. Hamming distance

model assumes that all 1→0 and 0→1 transitions in a digital circuit lead to the same

power consumption. By dividing the entire simulation of a circuit into small intervals, a

- 72 -

DPA Attack on AES

kind of power trace can be generated. This power trace does not contain actual power

consumption value but the number of transitions that occur in the corresponding time

interval. A formal definition of the hamming distance is given in the following.

1 2 1 2( , ) ( )HD S S HW S S= ⊕ (5.5)

The hamming distance of two values S1 and S2 corresponds to the hamming weight

of 1 2S S⊕ . The hamming weight corresponds to the number of bits that are set to one.

Hence, 1 2( )HW S S⊕ corresponds to the number of bits that differ in S1 and S2.

The formal equation of power consumption in HD model is as follows:

1 2( , )HDP HD S S nλ= +i (5.6)

λ and n are same as HW model. The HW is replaced by HD in this equation. S1 and

S2 are two states (the output of a circuit) in two clock cycles.

5.1.3 Hypothetical Power Consumption based on HD model: Case study

Based on the discussion in the last two sub sections, two cases of power consumption

of specific circuit are discussed in this section.

Figure 5.2 shows a RTL level graph of a circuit. This circuit includes two registers,

REG0 and REG1, and a combinational circuit is inserted between these two registers. The

left part of this figure represents the states of the circuit in a specific time (S0), and after

one clock cycle, the state of the circuit is changed (S1)as shown in the right part.

The power consumption of this circuit includes two parts: Sequential logic (Registers)

power consumption PREG and combinational logic power consumption PComb. For PREG, it

is,

( 0, 1)REG R REGP HD S S nλ= × + (5.7)

- 73 -

DPA Attack on AES

where Rλ is SNR ratio for registers, REGn denotes power noise produced by registers.

S0, S1 are states of REG in different clock cycles. For PComb, it also can be modeled as,

( 0, 1)Comb C CombP HD S S nλ= × + (5.8)

Cλ is SNR ratio for combinational circuit. Combn is power noise produced by

combinational circuit. The total power consumption of this circuit in this clock cycle is:

0

( ) ( 0, 1)( 0, 1)

Total REG Comb

R C total

total total

P P PHD S S n

HD S S nλ λλ

= += + += +

(5.9)

Figure 5.3 shows another case of a circuit. In this circuit, the execution results

produced by combinational circuit are feedback to the original register. As shown in this

figure, S0’ is a function result of S0: 0 ' ( 0)S f S= . The total power consumption of this

circuit becomes different since the architecture is changed. Same as discussed above, the

power consumption includes:

( 0, 0 ')REG R REGP HD S S nλ= × + (5.10)

( 0, 0 ')Comb C CombP HD S S nλ= × + (5.11)

0

( 0, 0 ') ( 0, 0 ')( ) ( 0, 0 ')

( 0, 0 ')( 0, ( 0))

Total REG Comb

R C total

R C total

total total

total total

P P PHD S S HD S S n

HD S S nHD S S nHD S f S n

λ λλ λλλ

= += + += + += += +

(5.12)

As a result, the power consumption of circuit CASE II only depends on one variable

S0, while in CASE I, it depends two variable S0 and S1.

- 74 -

DPA Attack on AES

Figure 5.2 Power consumption of a circuit: Case I.

Figure 5.3 Power consumption of a circuit: Case II.

- 75 -

DPA Attack on AES

5.1.4 Differential Power Analysis Attacks

Differential Power Analysis (DPA) attack was proposed by Kocher et al. in 1998. DPA

is a side channel attack method which measures power consumption and uses

hypothetical power model to recover secret information. In a successful attack, the

hypothetical power consumption trace of the correctly guessed key displays a significant

higher correlation with the actual measurements of the cryptographic device than others.

DPA attack has been proven to be practical and efficient. Therefore, it posed a serious

threat to the security of cryptographic devices.

CPA (Correlation coefficient Power Analysis) was proposed by Brier, Clavier and

Olivier in 2003 [54]. CPA uses hamming distance model instead of Hamming weight

model compared to Kocher’s original proposal. Also it uses correlation coefficient instead

of differential coefficient. The CPA is an improvement of original DPA, which provides

much more accurate to successfully attack a cryptographic device. Normally, people use

DPA attack as a common name to represent both of DPA and CPA attack. In this

dissertation, the DPA attack means power analysis attack using DPA or other optimized

DPA methods, such as CPA.

DPA uses following equations to calculate correlation coefficient:

DPA Algorithm:

[ ( ), ( )][ ( )] [ ( )]

Cov W t P kCorrVar W t Var P k

= (5.13)

1

1[ ( ), ( )] [ ( , ) ( )][ ( , ) ( )]N

iCov W t P k W t i W t P k i P k

N == − −∑ (5.14)

2

1

1[ ( )] [ ( , ) ( )]N

iVar W t W t i W t

N == −∑ (5.15)

2

1

1[ ( )] [ ( , ) ( )]N

iVar P k P k i P k

N == −∑ (5.16)

- 76 -

DPA Attack on AES

1

1( ) ( , )N

iW t W t i

N == ∑ (5.17)

1

1( ) ( , )N

iP k P k i

N == ∑ (5.18)

W(t) is power traces. P(k) is hypothetical power consumption which is calculated by

hamming distance. t is time step. k is hypothetical key. N is number of power traces.

Equation 5.17 and 5.18 calculate the mean of real power consumption and hypothetical

power consumption for each trace. Equation 5.16 and 5.17 are used to calculate the

variance for real and hypothetical power consumption. Equation 5.14 is used to calculate

covariance of real power and hypothetical power. Finally, the correlation coefficients are

calculated by equation 5.13.

5.2 DPA attack on AES

5.2.1 DPA attack on AES: An Example

Figure 5.4 Last round of AES module.

- 77 -

DPA Attack on AES

A success DPA attack is to probe the power consumption, to get the ciphertext, and to

model the hypothetical power consumption based on hardware. Finally, do the DPA

algorithm to calculate correlation coefficients of real power consumption and

hypothetical power consumption.

Figure 5.2 shows the last round of AES module. Attacker only knows ciphertext and

power consumption. The detailed DPA attack procedures are listed as following:

Step 1. Measuring the power consumption of AES encryption device. The power traces

data W(t) can be easily got by using oscilloscope. The device execute AES

encryption algorithm with the unknown, constant key. The ciphertext is known to

attacker.

Step 2. Calculating hypothetical intermediate values. As shown in figure, using known

ciphertext and hypothetical key, the corresponding intermediate value can be

calculated.

0( ) ( , )InterValue k InvOP Ciphertext key= (5.19)

InvOP represents inverse operation of AddRoundkey, ShiftRows and SubBytes. key0

is one byte of hypothetical key. It has 256 possibilities.

Step 3. Calculating hypothetical power consumption. Hamming distance model is

chosen in this step. Hypothetical power consumption is equal to hamming distance of

intermediate value and ciphertext.

- 78 -

DPA Attack on AES

( ) ( , ( ))P k HD Ciphertext InterValue k= (5.20)

Step 4. Calculating correlation coefficients by using P(k) and W(t). The correctly

guessed key is highly correlated with real power consumption, and it can be

identified from the significant peaks in DPA curves.

5.2.2 DPA attack on AES: A successful attack and a failed attack

According to the discussion in Chapter 5.2.1, the final Correlation Coefficients set

include three dimensions: Time, Hypothetical keys and Correlation coefficients value. It

can be denoted by ( , , )ByteKeyDPA T K C . T is Time, K is hypothetical key and C is

coefficients value. ByteKey is denotes which byte of key. For AES 128-bit key, there are

totally 16 bytes. The results of DPA attack can be clearly shown in the 2-D or 3-D view. 2-D view shows 16 graphs, and each graph represents a byte of key. The x-axis is time

axis, and y-axis is coefficients value. 2-D view shows the final result

as ( , , )ByteKey rightDPA T K C . It shows the coefficient curves under the right key. The curves

changes according to the others’ variable. A successful attack shows a significant peak in

each graphs (Figure 5.5). A failed attack shows smooth and noisy curves (Figure 5.7). 3-D view is used to show coefficient value mesh of one byte of key. x-axis is

hypothetical keys which is from 0 to 255. y-axis is time, and z-axis is coefficients value.

A successful attack shows a wall (Figure 5.6), while the failed attack shows smooth and

noisy mesh (Figure 5.8).

- 79 -

DPA Attack on AES

Figure 5.5 2-D views of successful DPA attack.

(16 bytes of key, the peak in each graph indicates that the hypothetical key is a right key)

Figure 5.6 3-D views of successful DPA attack.

(4th byte of AES key, the wall in this 3-D view indicates that the hypothetical key is a right key)

- 80 -

DPA Attack on AES

Figure 5.7 2-D views of failed DPA attack.

(16 bytes of key, it can’t found a significant peak in the correlation graphs)

Figure 5.8 3-D views of failed DPA attack.

(4th byte of AES key, there is no wall in the correlation coefficients mesh)

- 81 -

DPA Attack on AES

5.3 Conventional Countermeasure Methods

DPA attack works because the power consumption of cryptographic device depends on

intermediate values of the executed cryptographic algorithm. The goal of countermeasure

is to avoid or reduce these dependencies. Techniques to prevent DPA attack fall into two

categories, according to reference [53].

Hiding Method

Hiding method is done by breaking the link between power consumption of the devices

and processed data values. This method makes it difficult for an attacker to find

exploitable information in power traces. Two types of hiding methods are introduced by

Stefan Mangard in [53]: Time dimension hiding and amplitude hiding.

Time dimension hiding usually shuffles operations of cryptographic algorithms to

randomize the executions. This makes the power consumption appear to be more or less

random for an attacker. As shown in Figure 5.9, a power trace consists of several

operations, such as A, B, C shown in this figure. Each operation executes in different

clock cycles, and produces different power consumption. Before hiding, the execution

order is operations are fixed: A->B->C. After hiding, the execution order is shuffled. The

operations are executed randomly in time dimension. It changes in every power traces. As

a result, it is impossible for attacker to get the right power consumption for each

operations of AES algorithm, and the DPA attack also can’t be performed.

Amplitude dimension hiding method differs from time dimension hiding by adding a

noise source in the original circuit. As shown in Figure 5.10, the hided power

consumption consists of two power consumption source: the pure power produced by

pure circuit and noise power produced by noise source. The idea of this method is to

reduce the SNR ratios to make the power consumption becomes too noise to DPA

analysis.

- 82 -

DPA Attack on AES

Figure 5.9 Time dimension hiding.

Figure 5.10 Amplitude dimension hiding.

- 83 -

DPA Attack on AES

Currently, hiding methods is most commonly used in software implementations of AES

in embedded system. The conventional techniques are the random insertion of dummy

operations and the shuffling of operations. Hardware implementation of hiding methods

has not been reported yet.

Masking Method

Masking method is done by randomizing the intermediate values that are processed by

the cryptographic devices. This method makes the power consumption independent of the

intermediate values. The mask operations can be illustrated as

MS S M= ⊗ (5.21)

S is the internal state circuit before masking. M is a random number to mask S. SM is

masked internal state. After masking, the attacker doesn’t know the values of internal

state. The operation ⊗ is most often the Boolean exclusive-or, the modular addition, or

the modular multiplication.

Masking methods is widely used in both of software and hardware implementations of

AES to against DPA attack. The most frequently used masking method for AES hardware

design was proposed by Akkar and Giraud in [55]. This method includes two parts:

1) Register Masking. Register masking, or internal state masking, is to mask the

intermediate data when AES encryption/decryption is running. As shown in Figure 5.11,

all of the intermediate values (A, B, C, D, E) are masked with a random number X.

2) S-Box Masking. Since S-Box consumes a much higher power consumption compare

to other blocks, attacker always focus on S-Box. As shown in Figure 5.12, S-Box

masking is performed to mask a random number in every intermediate value. One

GF(256) inverter, four GF(256) multiplier and two GF(256) adder are added in the

original S-Box design as extra hardware cost for masking method.

- 84 -

DPA Attack on AES

Figure 5.11 AES after masking.

- 85 -

DPA Attack on AES

, ,i j i jA X⊕

⊗ ,i jY

, , ,( )i j i j i jA X Y⊕ ⊗,i jX

,i jY⊗, ,i j i jX Y⊗

, ,i j i jA Y⊗

1, ,( )i j i jA Y −⊗

,i jY

,i jX

1, ,i j i jX Y −⊗ ⊗

1 1, , , ,( ) ( )i j i j i j i jA Y X Y− −⊗ ⊕ ⊗

,i jY

1, ,i j i jA X− ⊕

Figure 5.12 S-Box after masking.

5.4 Conclusion

This chapter introduced the DPA attack. Firstly, the basic conceptions, such as power

consumption of CMOS circuit, power models for power estimation, basic DPA workflow

are introduced. Secondly, the DPA attack on AES algorithm is introduced. An attack

procedure consists of four steps, and the final results of a successful attack and a failed

attack are also discussed. Finally, two countermeasure methods, hiding and masking, are

introduced.

- 86 -

AES Design with DPA Countermeasure

6 AES Design with DPA Countermeasure

For hardware design of AES countermeasure DPA attack, currently, only masking

methods is proposed. And almost all of the hardware implements only use masking

methods to countermeasure DPA attack. However, masking method is proved unsecured

to high-order DPA attack in software implementation in [56]. For hardware

implementation, currently, there are few articles about masked AES DPA attack.

Nevertheless, it is still a risk. And many new attack methods are still under research. Since the attack methods change very quickly. Some countermeasure methods even

show their strong points to some specific attack methods, however, for other methods, the

weak points are still existed. A basic opinion to improve the security is to combing

several countermeasure methods together. In this chapter, we propose several DPA

countermeasure methods for AES hardware design, and finally, an ultra low-cost AES

design with multiple DPA countermeasures, which combines masking, hiding, and our

proposed independent ARK and Data sliding, is proposed. The experiment environment

(DPA attack system) and experimental results are also provided in this chapter.

6.1 Proposed DPA Countermeasure methods for AES

6.1.1 Register Masking

As discussed in Chapter 5, DPA attack use dynamic power consumption and power

model to do attack. For dynamic power consumption in a circuit, it includes sequential

logic power consumption (Consumed by Registers) and combinational logic power

consumption. In order to countermeasure DPA attack, both of registers and combinational

logics should be masked.

- 87 -

AES Design with DPA Countermeasure

Figure 6.1 The round ith of the AES without and with masking countermeasures.

- 88 -

AES Design with DPA Countermeasure

Since registers are updated in every clock cycle. They consumes a lot of power

consumption, and even more, all of registers are refreshed in the same time, it is very

easy for attacker to locate the right positions in power trace, according to the hypothetical

power model. The registers masking is to mask the values stored in register. After

masking, the power consumption of registers PR is randomized by a factor of random

number X:

( , ' )R RP HD S X S X nλ= ⊗ ⊗ + (6.1)

Figure 6.1 shows round function of AES without and with masking countermeasures. In

the left part, all of the values (A, B, C, D, E) stored in registers are plaintext. The internal

states of AES are known to attacker. After masking, all of the internal states are masked

by a random number X. And this number is changed for every times of encryption.

Figure 6.2 Proposed Registers Masking.

Figure 6.2 shows the registers masking method. Two exclusive-or gates are inserted

around the registers. The values stored in the registers are randomized.

- 89 -

AES Design with DPA Countermeasure

6.1.2 S-Box Masking

The original S-box and the proposed masked S-box are shown in Figure 6.3. Original

S-box includes a GF(28) inversion and a affine operation. GF(28) inversion is a Galois

field operation which is non-linear. Affine operation is a simple matrix transformation

which can be easily implemented by hardwire.

In order to conceal the intermediate value, a simplified S-box masking method is

proposed. The main idea of S-box masking method is: Using a random number to mask

the input data of S-box. In this way, the power consumption of S-box only depends on

masked input data, and independent of original data. Since Galois field is a non-linear

transform field, the common exclude or operation can’t be directly used for masking. As

a result, people use Galois field multiplication to do masking. As shown in Figure 6.3, two Galois field multipliers are added, and a random number Y

is used as masking pattern. Since Y is independent of A, A×Y is also independent of A.

The power consumption is a linear function of hamming-distance of A×Y, thus, the

power consumption is independent of A. The masked combinational logic’ power

consumption PComb is represented as:

( 0( ), 1( ))Comb C CP HD S Y S Y nλ= × + (6.2)

Compared to the original masking method, which masks both of S-box and data bus in

[55], in hardware implementation, our proposed design saves one Galois field inverter,

two Galois field multipliers and two Galois field adders.

- 90 -

AES Design with DPA Countermeasure

Figure 6.3 Proposed S-Box Masking.

6.1.3 Subbytes Hiding

Subbytes Hiding means that the sequence of subbytes opearation is shuffled. Subbytes

Hiding breaks the correlation of power consumption and AES operations in time domain.

After shuffling, the attacker can’t know the corresponding operations in the power trace’s

time coordinate. Thus, the hypothetical power consumption can’t be correctly calculated

for power analysis.

A power trace of AES hardware is shown in Figure 6.4. Four operations are executed

one by one: ShiftRows, SubBytes, MixColumns and AddRounKey. Each operation

produces different power consumption. However, for attacker, they know that the same

operation will occurs in the same time in different power traces, like the Figure 6.5 a)

shows. This makes the power analysis attack become feasible.

- 91 -

AES Design with DPA Countermeasure

Figure 6.4 A power trace of AES.

a) Subbytes without hiding b) Subbytes with hiding

Figure 6.5 Subbytes without and with hiding.

- 92 -

AES Design with DPA Countermeasure

In order to prevent the power analysis attacker, the hiding methods shuffle the

execution of operations. As a result, in the collected power traces, the same operation is

randomly distributed in time domain, as shown in Figure 6.5 b). The power consumption

of Subbytes Psub can be denoted as,

0 1 ( 1){ , , ... }Sub t t t nP P P P −∈ (6.3)

Pti represents the power consumption in ith clock cycle. There are totally n possibilities

of Psub, and the n depends on the hiding methods used.

Figure 6.6 Hardware design of Subbytes hiding.

The hardware design of Subbytes hiding is shown in Figure 6.6. One of 16 bytes of data

is selected to do SubBytes. The SEL module is a multiplex for 16-to-1 selection. LFSR is

linear feedback shifting registers used to generate 4-bit selection signals. The initial

vector of LFSR is generated by Random number generator (RNG). RNG is a module

outside of AES, and normally, it existed in every cryptographic device.

- 93 -

AES Design with DPA Countermeasure

6.1.4 Independent ARK and Data Sliding

Independent ARK

The conventional AES hardware design integrated all AES operations on one data path

to save clock cycles. As shown in Figure 6.7, in the last round of AES, Subbytes and

AddRoundKey are executed within one clock cycle, and the ciphertext is calculated. The

power consumption of this procedure can be represented by

( ) ( , 1)( , ( , ))

P key HD C S nHD C f C key n

λλ

= += +

(6.4)

Function ()f is the inverse operation of Subbytes and AddRoundKey. C is ciphertext

which is known to attacker. S1 is an internal state of circuit, and S1 is the result of inverse

function of C. key is unknown, and the attacker uses a hypothetical key in this equation to

calculate the hypothetical power consumption. As discussed in the last chapter, DPA

attacks use hypothetical power and real power to do attack. The right key can be easily

recognized in the correlation coefficient graph. In this equation, attacker needs to guess

one byte of key (8-bit), which has 256 possibilities.

Independent ARK means that AddRoundKey operation is separated from other

operations. As shown in Figure 6.8, the last round of AES is separated to two sub steps:

Subbytes and AddRoundKey. These two steps are executed in different clock cycles. For

DPA attack, only Subbytes operation can be used because S-Box consumes much power

than other operations. The power consumption of Subbytes in this circuit is,

1

( ) ( 2, 3)( ), ( ))

P key HD S S nHD C key Subbyte C key n

λ

λ −

= +

= ⊕ ⊕ + (6.5)

S2, S3 are two internal states of circuit. S2 is the result of exclusive-or of ciphertext

and key. S3 is the result of inverse subbyte of S2.

- 94 -

AES Design with DPA Countermeasure

S1

SubBytes

+ Key

C (Ciphertext)

SubBytes

+

Subbytes, AddRoundKey

Last Round

Figure 6.7 Integrated Subbytes and AddRoundKey.

Key

S2

SubBytes +

C (Ciphertext)

SubBytes

8 bits

+

S3

SubBytes +

Subbytes AddRoundKey

Last Round

Figure 6.8 Separated Subbyte and AddRoundKey.

- 95 -

AES Design with DPA Countermeasure

Figure 6.9 Feedback structure and Data Sliding Structure.

Data Sliding

Data sliding is used to make the states of registers relate to its neighbouring registers.

In Figure 6.9, two kinds of circuit structures are showed:

A) Feedback circuit structure

Feedback circuit means that the source and destination of a data is the same. As

shown in this figure, R0, R1 are two registers. R0 and Subbytes make up a feedback

circuit. In time t0, the states of these two registers are S0 and S1. After one clock cycle

(t1), the state of R0 is changed to S0’. Recall the discussion in Section 5.1.3, the power

consumption can be represented as,

( 0, 0 ')( 0, ( 0))

P HD S S nHD S Subbytes S n

λλ

= += +

(6.6)

- 96 -

AES Design with DPA Countermeasure

B) Data Sliding circuit structure

In Data Sliding circuit, there is no feedback circuit. And destination and source of a

combinational circuit are pointed to different registers. As shown in this figure, the

input of Subbytes comes from R0, and the output of Subbytes goes to R1. The changes

of state of this circuit are also listed in the table of this figure. The power consumption

of Data Sliding circuit can be represented as,

( ( 0, 1) ( 1, 0 ')) ( 0 ', 1')( ( 0, 1) ( 1, ( 0)))

( ( 0), ( 1))

REG Comb

REG

Comb

P HD S S HD S S HD S S nHD S S HD S Subbyte SHD Subbyte S Subbyte S n

λ λλλ

= + + += ++ +

(6.7)

Different from feedback circuit, the power consumption of this circuit depends on

two registers. Both of two registers’ state should be took account for power consumption.

As shown in this equation, power consumption P depends on both of S0 and S1.

While combining Independent ARK and Data Sliding together, the power consumption

becomes:

1 1

1

( 0, 1) ( ( 0, 1) ( 1, 0 ')) ( 0 ', 1')

( ( ( 0 0), ( 1 1))

( ( 1 1), 0 0))( 0 0, 1 1)

REG Comb

REG

Comb

P key key HD S S HD S S HD S S n

HD Subbyte C key Subbyte C key

HD Subbyte C key C keyHD C key C key n

λ λ

λ

λ

− −

= + + +

= ⊕ ⊕

+ ⊕ ⊕+ ⊕ ⊕ +

(6.8)

key0 and key1 are two bytes of key. C0 and C1 are ciphertext which is known to attacker.

In this power consumption equation, there are two bytes of keys need to hypothesize.

Compared to the DPA attack on conventional circuit which only one byte of key need to

hypothesize, our proposed methods increase it to two bytes. As a result, the

computational cost is increased to 28 times for every power trace.

- 97 -

AES Design with DPA Countermeasure

6.1.5 Time Complexity Analysis

The proposed DPA countermeasure methods greatly increase the computational

complexity of DPA attack methods. The time complexity of DPA attack on AES design

with countermeasure methods is analyzed in this section. For DPA attack, the time complexity includes three parts: 1) Power traces measuring. 2)

Hypothetical power modeling. 3) Correlation coefficient calculating. For DPA attack on

pure AES (without countermeasure) and secure AES (with countermeasure), the

difference happened in the part 2 and part 3. For pure AES, the hypothetical power traces

is,

0

( , ( , ))~ ( )

AESP HD C f C key nf keyλ= +

(6.9)

In order to do DPA attack, the value of key should be hypothesized. Every hypothetical

key corresponds to a set of hypothetical power traces .The correlation coefficients are

calculated for every hypothetical power traces and real power traces. We define the time

complexity of DPA attack on pure AES as: ~ ( )o DPA .

For AES with countermeasure methods, because the power consumption changes

according to different methods (as discussed in section 6.1.1-6.1.5), the hypothetical

power model has much more unknown variables than pure AES. A summary of

hypothetical power consumption and time complexity of DPA attack is shown as follows:

AES with Masking method

As discussed in equation 6.1 and 6.2, the hypothetical power consumption of AES with

masking method is,

- 98 -

AES Design with DPA Countermeasure

0

( , ( , , ))

~ ( , )AES MaskingP HD C f C key X n

f key X

λ+ = + (6.10)

Compared to equation 6.9, an additional 8-bit random number X is included in this

equation. In order to get the power trace, for every hypothetical key, it should additional

guess the random number X. For a single power trace, the possibility of power value is

increased to 28 times. For n power traces, the possibility of this power trace set will be

increased to 28N. In other words, the time complexity also increased to 28N.

AES with Hiding method

As shown in equation 6.3, the hypothetical power consumption of AES with hiding

method is,

1

1

( , ( , ))

~ ( , )

Y

AES hiding i ii

P HD C f C key n

f key Y

λ+=

∈ +∪ (6.11)

Y is a selection random number which indicate which power value is the right value for a

specific time space. Normally, Y equals 16 since there are 16 Subbytes operations in each

round of AES algorithm. Similarly, for a single power trace, the possibility of power trace

is increased to 24 times, and for n power traces, the possibility is increased to 24N. The

time complexity is also increased to 24N.

AES with Independent ARK + Data Sliding

As discussed in Section 6.1.4, the hypothetical power consumption of AES with

Independent ARK and Data Sliding is,

- 99 -

AES Design with DPA Countermeasure

. & . .

2

( ( 0, 0), ( 1, 1))~ ( 0, 1)

AES I ARK D SP HD f C key f C key nf key keyλ+ = +

(6.12)

key0 and key1 are 8-bit hypothetical keys. This equation is similar as 6.10 (Considering

key0 as key, key1 as random number X). The time complexity analysis is also similar as

masking. The complexity is increased to 28N times.

Table 6.1 summarizes the power consumption of AES design with each

countermeasure methods. Table 6.2 summarizes the time complexity for AES without

countermeasure, AES with masking, AES with Hiding, and AES with Independent ARK

&Data Sliding.

Our proposed countermeasure methods also can be combined together to improve the

security to a higher level. For example, combing all of methods (Masking, Hiding,

Independent ARK & Data Sliding) together, the power consumption becomes,

0 11

3

( ( , 0, ), ( , 1, ))

~ ( 0, 1, , )

Y

AES All i ii

P HD f C key X f C key X n

f key key X Y

λ+=

∈ +∪ (6.13)

There are four unknown variables in this equation: Key (key0, key1) and random number

(X, Y). The computational complexity of DPA attack becomes 20( ) 2 No DPA × i , which is

212N times more secure than the AES design only with masking method. In this way, even

the masking method may be proved to unsecure, the other countermeasure methods can

also guarantee the security.

- 100 -

AES Design with DPA Countermeasure

Table 6.1 Summary of different countermeasure methods.

Description Effect to power consumption

Masking Randomize the internal

data

( , ) ( , ( , , ))P key X HD C f C key X nλ= +

Add a random number X in power consumption

Subbytes Hiding Shuffling the execution

order of Subbytes

16

1( , ) ( , ( , ))i i

iP key X HD C f C key nλ

=∈ +∪

Right power consumption belongs a member of

power consumption set

Independent ARK

with Data Sliding

Equal to mask data with

another key

( 0, 1) ( ( 0, 0), ( 1, 1))P key key HD f C key f C key nλ= +

Add another key in power consumption

Table 6.2 Comparison of time complexity for each countermeasure methods.

Power consumption Complexity

AES without

DPA

( ) ( , ( ))P key HD C f C nλ= + ~ ( )o DPA

Masking ( , ) ( , ( , , ))P key X HD C f C key X nλ= + 8~ ( ) 2 No DPA × i

Subbytes Hiding 1

( , ) ( , ( , ))Y

i ii

P key Y HD C f C key nλ=

∈ +∪ 4~ ( ) 2 No DPA × i

Independent ARK

with Data Sliding

( 0, 1) ( ( 0, 0), ( 1, 1))P key key HD f C key f C key nλ= + 8~ ( ) 2 No DPA × i

- 101 -

AES Design with DPA Countermeasure

6.2 Ultra Low-cost Design of AES with DPA Countermeasure

6.2.1 Specification

The data size of coded video highly depends on the resolution, frame rate and coding

methods. Resolution is the size of picture in video sequence. When the resolution

becomes higher, every frames of video sequence consists of more MBs, and the data size

will greatly increased. In the other hand, high resolution makes video contains more

details, and becomes more attractive to audience. Frame rate is the number of frames

within one second. When frame rate increasing, it means that there are more pictures

should be displayed in one second. High frame rate makes moving pictures seem

smoother, especially for high motion pictures. Normally, for low resolution video (Less

than 1920×1088), the frame rate is set to 30 fps, and for high resolution video (More

than 1920×1088), the frame rate is set to 60 fps. Coding methods is other important

factor for video data size. Some coding methods greatly affect the coded video data size

such as RDO (Rate-Distortion Optimization), QP Matrix, CAVLC, CABAC and so on.

For more information about coding methods, please refer to [46].

Table 6.3 shows the maximum bit-rate of selected levels in H.264 [57]. Each level

defined the maximum bit-rate, video resolution, frame rate and maximum stored frames

in buffer. Table 6.3 only contains a part of levels’ definition. The complete levels list can

be found in [57]. Some frequently-used video resolutions are listed in this table. 176×

144 (QCIF) and 352×288 (CIF) are usually used in the mobile phone. Since the screen

size and the battery power of mobile phone are limited, the small size video is acceptable

for users. Currently, the mobile TV “ONESEG” in Japan [51] uses QVGA (320×240) @

15fps, 128kbps to broadcasting TV for mobile phone. 720×480 (VGA) is normally used

in high-end portable media player. 1280×720 (HDTV 720p) and 1920×1080 (HDTV

1080p) are widely used in High Definition TV. For future use, the 4096×2048 (4kx2k)

and 8192×4096 (8kx4k) super-HDTV are under researching.

- 102 -

AES Design with DPA Countermeasure

Table 6.3 Max bit-rate and resolution of selected H.264 levels.

H.264

Levels

Max bit rate (bps) Resolution

@

frame rate

Baseline

Main

Extend

Profile

High

Profile

High 10

Profile

High 4:2:2

4:4:4

Profile

1 64 K 80 K 192 K 256 K 128×96@30

1.1 192 K 240 K 576 K 768 K 176×144@30

2 2 M 2.5 M 6 M 8 M 352×288@30

3 10 M 12.5 M 30 M 40 M 720×480@30

3.1 14 M 17.5 M 42 M 56 M 1280×720@30

3.2 20 M 25 M 60 M 80 M 1280×720@60

4 20 M 25 M 60 M 80 M 1920×1080@30

4.1 50 M 62.5 M 150 M 200 M 2048×1024@30

4.2 50 M 62.5 M 150 M 200 M 2048×1080@60

1920×1080@64

5 135 M 168.75 M 405 M 540 M 1920×[email protected]

2048×[email protected]

2048×[email protected]

5.1 240 M 300 M 720 M 960 M 1920×[email protected]

4096×2048@30

- 103 -

AES Design with DPA Countermeasure

Currently, 1920 × 1080@60fps, high profile is the highest configuration for

commercial products. For video communication, like video conference, the widely used

resolution is VGA. As a conclusion, the maximum bit rate for current video applications

is under 62.5 Mbps. For real-time video encryption module, the bit-rate should above this

number.

6.2.2 Hardware Architecture

As discussed in the last sub section, the maximum throughput for video application is

about 62.5 Mbps. The throughput of lowest hardware cost AES base on scalable

architecture proposed in chapter 4 can achieve 75 Mbps. In this way, for real-time video

encryption, the lowest hardware cost architecture in chapter 4 is the most suitable to be

used.

Figure 6.10 shows the hardware architecture of ultra low-cost AES design with DPA

countermeasure. This architecture bases on our proposed scalable architecture in chapter

4. In this way, most part of architecture is the same. Some important points of this

architecture include:

S-Box with masking: Only one S-Box and one Mixcolumns are used in this design

to reduce the total hardware cost. The S-Box masking method proposed in section 6.1.2 is

used.

Subbytes Shuffling: A 17-to-1 multiplexer is used to do SubBytes hiding. One of 16

data is randomly selected to do Subbytes in every clock cycle. The selection signal is

produced by the circuit proposed in section 6.1.3.

Register Masking: All of the data registers are masked. There are two sets of XOR

gate array before and after each data register, same as in section 6.1.1.

- 104 -

AES Design with DPA Countermeasure

Figure 6.10 Ultra low-cost AES with DPA countermeasure.

Figure 6.11 Data flow for ultra low-cost AES.

- 105 -

AES Design with DPA Countermeasure

Independent ARK: Independent ARK has been already used in the scalable

architecture. Because the data path is separated into 3, the operation AddRoundKey,

SubBytes and MixColumns are independent with each other.

Data Sliding: Data Sliding is achieved by right shifting of data registers in this

architecture. Since only one S-Box is used, the right-shifting should be done in every

clock cycle. This effect equals to data sliding.

In hardware design, masking and hiding methods cost extra hardware cost, in contrast,

Independent ARK and Data Sliding is architecture level design which didn’t cost any

extra hardware.

6.2.3 Data Flow

The dataflow of proposed ultra low-cost AES design with DPA countermeasure follows

the similar way of unified architecture, which has been already discussed in section 4.4.2.

Figure 6.11 shows this dataflow in detail. The meaning of the notations used in this

dataflow is listed in the notations table. Every block in this dataflow represents a clock

cycle. The operations in the same block means that these operations are executed in

parallel.

The total dataflow consists of three parts: First round, Round i (i is from 1 to 9), and

the Last round. First round cost 5 clock cycles and it execute three operations:

Addroundkey, ShiftRows and key Subbytes. Because Addroundkey and ShiftRows are

done to data registers, they can be merged into one combined operation. The Round i is a

loop function which is executed for 9 times. There are totally 6 operations are executed:

Key update, Data Subbytes, MixColumns, Key Subbytes, AddRoundKey and ShiftRows.

Many operations are executed in parallel to save clock cycles. The Last round consists of

only three operations: Key update, Data Subbytes, and Addroundkey. From this dataflow,

- 106 -

AES Design with DPA Countermeasure

it can be seen that the Addroundkey is always executed as independent. And the data

Subbytes can be executed randomly in every steps.

6.2.4 Implementation

In order to compare the hardware cost for AES designs with different countermeasure

methods, we implement 4 AES designs. All of the designs are coded by verilog HDL, and

synthesized by Synopsys Design Compiler. TSMC 0.18 um standard cell library are used

for circuit synthesis.

AES without countermeasure methods (AES 0)

This AES design has been proposed in reference [58]. The architecture of this design is

similar as scalable architecture proposed in Chapter 4. In order to further reduce the

hardware cost, the Addroundkey, MixColumns and ShiftRows are integrated into a 32-bit

data path. In this way, there are only two parallel data path. This design achieves lowest

hardware cost for pure AES design (without DPA countermeasure). The detailed

description can be found in [58]. And the reconfigurable design of this architecture can be

found in [66]. Table 6.4 shows the hardware cost of AES0 under 80 MHz clock frequency. AES with Independent ARK and Data Sliding (AES 1.0)

This AES design uses the architecture shown in Figure 6.10. Only independent ARK and

Data sliding are used. In the other words, the 17-to-1 multiplexer shown in this figure is

not used. Independent ARK and Data Sliding are inherent from this architecture and

dataflow. Table 6.6 shows the hardware cost of AES 1.0. The frequency achieves 125 MHz,

which is much higher than AES0. The reason is that AES1.0 use three datapaths, thus, the

critical path is much shorter than AES0.

- 107 -

AES Design with DPA Countermeasure

Table 6.4 AES0@80MHz, TSMC 0.18um

(Pure AES)

Components Gates

S-Box 358

MixColumns 376

Key Expander 1935

Controller 247

Data Registers

+ others (ARK, ShiftRow)

1762

Total 4678

Table 6.5 AES1.1@125MHz, TSMC 0.18um

(AES + Subbytes Hiding)

Components Gates

S-Box 383 MixColumns 313 KeyExpander 2220 Controller 235 Data Registers + Others (Multiplexer, ARK, ShiftRows)

3093

Total 6244

Table 6.6 AES1.0@125MHz, TSMC 0.18um

(AES + Independent ARK, Data Sliding)

Table 6.7 AES1.2@75MHz, TSMC 0.18um

(AES + Masking)

Components Gates S-Box 423 MixColumns 313 KeyExpander 2223 Controller 235 Data Registers

+ others (ARK, ShiftRow) 2306

Total 5500

Components Gates

S-Box 1124 MixColumns 325 KeyExpander 2220 Controller 235 Data Registers + Others (Xor Gates, ARK, ShiftRows)

2930

Total 6834

- 108 -

AES Design with DPA Countermeasure

AES with Subbytes hiding (AES 1.1)

AES with Subbyte hiding adds a 16-to-1 multiplexer compare to AES 1.0. Same

architecture and same dataflow are used. Table 6.5 shows the hardware cost of AES1.1

under 125 MHz. The performance of AES1.1 is same as AES1.0.

AES with masking (AES 1.2)

AES with masking consists of register masking and S-Box masking. Since masking

adds many circuit to original design, the performance reduced very much compare to

AES1.0. Table 6.7 shows the hardware cost of AES1.2. The clock frequency of AES1.2

reduced to 75 MHz.

From the hardware implementation results listed above, the Independent ARK, Data

Sliding has the smallest effect to hardware cost. Subbytes hiding method is the second

low effect method to hardware design. Masking shows its weak point to both of hardware

cost and performance reduction. For some hardware cost sensitive AES design, like RFID,

our proposed Independent ARK, Data sliding and Subbytes hiding is much better to be

used than masking.

6.3 DPA Attack Evaluation Environment

6.3.1 DPA attack platform

In order to implement the DPA attack on AES algorithm, firstly an attack environment

is necessary. We use Sasebo board to process the test of power analysis attack. Also, we

need an oscilloscope to retrieve the power traces derived from the FPGA board. Moreover,

a PC is needed to process the retrieved power traces data, using the specified power

model.

The following Figure 6.13 shows the photo of our DPA attack system. Figure 6.13 shows

- 109 -

AES Design with DPA Countermeasure

the system architecture. We use SASEBO board provided by AIST to do the AES

operation. The board is connected to the independent power supply. While the encryption

is running, we use digital oscilloscope to retrieve the power traces. We record the power

traces data when there is a trigger signal. After record the data, we transmit the data back

to PC. We transfer two types of data. The cipher text encrypted by the SASEBO device is

transmitted to PC through RS232 serial port communication. On the other hand, the

digitized power trace waveform data will be transmitted back through LAN. The power

analysis attack is totally based on the power consumption data and the cipher text. The

detailed description of our DPA attack system also can be found in [69].

6.3.2 Sasebo Board

Side-channel Attack Standard Evaluation Board (SASEBO) is a board specifically

designed to develop standard evaluation schemes to secure the cryptographic module

against physical attacks. This system is developed by AIST and Tohoku University

[59][60]. It has FPGA version and ASIC version. FPGA version uses a Xilinx FPGA

Virtex-II XC2VP7 to implement AES designs in the board. ASIC version uses an ASIC

chip, which has already implemented several AES designs in this chip. In this dissertation, we only use FPGA version, because the proposed AES hardware

can be implemented in the FPGA. Figure 6.14 shows the architecture of the SASEBO

board. There are two FPGA modules in this board: FPGA1 is used for cryptographic

operation; FPGA 2 is used for control logic. Two EEPROMs are used to configure FPGA,

and the configuration file is downloaded through the JTAG port. The power supply and

clock source of each FPGA is separated. For PC communication, a RS232 serial port is

used. LED module is used to express the internal status of FPGA1. Detailed description

of SASEBO-G board could be found in website [59][60].

- 110 -

AES Design with DPA Countermeasure

Figure 6.12 DPA Attack Evaluation System (Photo).

Figure 6.13 DPA Attack Evaluation System (Architecture).

- 111 -

AES Design with DPA Countermeasure

LED

Figure 6.14 Sasebo Board.

6.3.3 Test Flow

In this part, we will give the flow path of the testing process. The complete test flow is

shown in Figure 6.15:

Firstly, we select the AES encryption. Then, the oscilloscope needs to be initialized

(For example, the sampling rate is set to 2GSa/s). Then, the number to execute AES

operation should be set. After that, check the oscilloscope to see whether it is in ‘Run’

status or not. If not, move back to oscilloscope initializing phase. Then, send the control

signal to FPGA through the RS232C serial port, according to the input data format. After

receiving the control signal, the FPGA could do encryption, decryption or reset. Here, we

do encryption in order to get power trace data. After the encryption, transfer the data back

to PC. At the same time, check if there is a trigger existed. If not, it means that there is

something wrong with the FPGA. If FPGA is normally running, record the power trace

data on PC through LAN, in a CSV or text file type. If the number of AES operation is

satisfied, which means all the operation is done, then step into DPA attack phase.

- 112 -

AES Design with DPA Countermeasure

Figure 6.15 DPA attack test flow.

- 113 -

AES Design with DPA Countermeasure

In DPA attack step, we use Hamming distance model to build the relationship between

power traces and processed data. Then we use correlation coefficient to present the

intensity of the two factors. In the whole flow, we need to transfer data between different

equipments. We use RS232C serial port to connect host PC and FPGA board, and use

LAN to control the data reading/writing between PC and oscilloscope.

6.4 Experiment Results of DPA Attack

Figure 6.16 shows a power trace measured by oscilloscope. For AES encryption,

totally it needs 211 clock cycles. We samples about 5000 points from the start of

encryption to the end of encryption. For one time DPA attack, we collect 5000 power

traces to do DPA analysis. Figure 6.17 and Figure 6.18 shows the 2-D and 3-D result of

DPA attack on Pure AES. Figure 6.19 and Figure 6.20 shows the DPA attack result of

AES design only with hiding method. Figure 6.21 and Figure 6.22 shows the DPA attack

result of AES design only with masking methods. Figure 6.23 and Figure 6.24 shows the

DPA attack result of AES design only with Independent ARK and Data Sliding. From

these figures, all of proposed DPA countermeasure methods can against DPA attack very

well.

Figure 6.16 Power trace from oscilloscope

- 114 -

AES Design with DPA Countermeasure

Figure 6.17 2-D view of DPA attack on Pure AES.

Figure 6.18 3-D view of DPA attack on Pure AES.

(4th byte of key, the other 15 bytes of key are similar)

- 115 -

AES Design with DPA Countermeasure

Figure 6.19 2-D view of DPA attack on AES with Subbytes hiding.

Figure 6.20 3-D view of DPA attack on AES with Subbytes hiding.

(4th byte of key, the other 15 bytes of key are similar)

- 116 -

AES Design with DPA Countermeasure

Figure 6.21 2-D view of DPA attack on AES with masking.

Figure 6.22 3-D view of DPA attack on AES with masking.

(4th byte of key, the other 15 bytes of key are similar)

- 117 -

AES Design with DPA Countermeasure

Figure 6.23 2-D view of DPA attack on AES with Independent ARK and Data Sliding.

Figure 6.24 3-D view of DPA attack on AES with Independent ARK and Data Sliding.

(4th byte of key, the other 15 bytes of key are similar)

- 118 -

AES Design with DPA Countermeasure

6.5 Chip Design

In order to evaluate the proposed countermeasure methods in ASIC, a test chip is

designed. This chip is designed for VDEC project [70]. ROHM 0.18 um standard cell

library is used. The chip size is 2.5mm×2.5mm according to the VDEC project’s

constrains. This chip contains four AES designs as discussed in Section 6.2.4. Top

module consists of multiplexers and UART module. A select signal is used to select one

of four AES designs under running. The Architecture of this chip is shown in Figure 6.25.

AES0

AES1.0

AES1.1

AES1.2

UART

Sender

Receiver

Status Status

TXD

RXD

clock reset

Figure 6.25 Test Chip Architecture.

- 119 -

AES Design with DPA Countermeasure

Figure 6.26 Chip design of AES

Table 6.8 VDEC Test Chip.

Technology ROHM 0.18 um

Chip Size 2.5mm×2.5mm

PAD Number 128

Voltage 1.8V

Metal 5

Frequency ~100 MHz

Designs AES0, AES1.0, AES1.1, AES1.2

AES0 AES1.0

AES1.1 AES1.2

TOP

Module

+

UART

- 120 -

AES Design with DPA Countermeasure

6.6 Conclusion

This chapter presented five DPA countermeasure methods for AES hardware design:

Register Masking, S-Box Masking, Subbytes Hiding, Independent ARK and Data Sliding.

The theoretical analysis shows that the complexity of DPA attack on the AES, which uses

hybrid countermeasure solution, will be increased to 212N times. In this way, even if one

or two countermeasure methods are cracked, the remained other countermeasure methods

can also prevent a successful attacking. For hardware design, an ultra low-cost AES

design with these countermeasure methods is proposed. This AES is designed for

real-time video encryption. Only one S-box and one Mixcolumns are used in the

architecture. The effect of hardware cost for different countermeasure methods is

discussed. Finally, in order to evaluate the effectiveness of proposed countermeasure

methods, a DPA attack evaluation system and a test chip which includes 4 AES cores was

implemented. The DPA attack experimental results show that our proposed

countermeasure methods successfully prevent DPA attack.

- 121 -

Conclusion

7 Conclusion

In this dissertation, a new video encryption scheme and the hardware design of

encryption module are proposed. This dissertation consists of three parts: 1) In algorithm

level, a new video encryption scheme is proposed. 2) In hardware level, a scalable

hardware architecture for AES algorithm is proposed. 3) In security level, the DPA

countermeasure methods for AES hardware design are proposed.

Conventional selective video encryption schemes have a lot of problems, such as low

security, high computational cost and hard to be implemented. In order to improve the

security and reduce the computational cost of video encryption, we proposed an Unequal

Secure Encryption (USE) scheme for video encryption, especially for H.264/AVC video

coding standard. This scheme mainly includes two parts: Data classification and Unequal

secure encryption. For data classification, we proposed three data classification methods

based on H.264/AVC. After data classification, the video bit stream can be separated into

two parts: important data partition and unimportant data partition. There are totally four

security levels defined in USE scheme. These security levels are used to balance the

security strength and computational complexity. For unequal secure encryption, we use

two encryption methods: AES encryption algorithm for important data partition, and

FLEX encryption algorithm for unimportant data partition. The FLEX algorithm is based

on AES, and the speed is 5 times of AES. In this way, for encryption module design, only

AES should be implemented.

For hardware design of AES algorithm, a scalable architecture is proposed. Since the

video data size changes very much according to different video levels, a fixed

architecture with specific performance is not a good solution. In this dissertation, we

proposed a scalable architecture. The number of S-Box and MixColumns is configurable

- 122 -

Conclusion

in this architecture. Totally, 1-20 S-Boxes and 1-4 MixColumns can be used. The

experimental results show that the lowest cost implementation only uses one S-Box and

one MixColumns. The throughput achieves 75 Mbps. While using 20 S-Boxes and 4

MixColumns for highest performance implementation, the throughput can achieve 2.4

Gbps. In order to enhance the security of AES encryption module, especially for DPA attack

countermeasure, we proposed five DPA attack countermeasure methods: Register

Masking, S-Box Masking, Subbytes Hiding, Independent ARK and Data Sliding. Combing

with these methods, an ultra low-cost AES design with multiple DPA countermeasure

methods is proposed. The DPA attack experimental results show that our proposed

methods successfully prevent DPA attack.

In conclusions, an efficient video encryption scheme for H.264/AVC video coding

standard, and the hardware implementation of the encryption module are presented in this

dissertation. The design proposed in this paper is very useful for secure video

communication systems.

- 123 -

Reference

Reference

[1] ISO/IEC 11172, Information technology – coding of moving pictures and associated

audio for digital storage media at up to about 1.5Mbit/s, 1993 (MPEG-1).

[2] ISO/IEC 13818, Information technology: generic coding of moving pictures and

associated audio information, 1995 (MPEG-2).

[3] ISO/IEC 14496-2, Coding of audio-visual objects – Part 2: visual, 2001.

[4] ISO/IEC 15938, Information technology – multimedia content description interface

(MPEG-7), 2002.

[5] ISO/IEC 21000, Information technology – multimedia framework (MPEG-21), 2003.

[6] ITU-T Recommendation H.261, Video CODEC for audiovisual services at px64

kbit/s, 1993

[7] ITU-T Recommendation H.263, Video coding for low bit rate communciation,

Version 2, 1998.

[8] ISO/IEC 14496-10 and IUT-T Rec. H.264, Advanced Video Coding, 2003.

[9] X. Liu and A.M. Eskicioglu "Selective Encryption of Multimedia Content in

Distribution Networks: Challenges and New Directions,” IASTED International

Conference on Communications, Internet and Information Technology (CIIT 2003),

Scottsdale, AZ, November 17-19, 2003.

[10] L. Qiao and K. Nahrstedt, “Comparison of MPEG Encryption Algorithms,”

International Journal on Computer and Graphics, Special Issue on Data Security in

Image Communication and Network, 22(3), 1998.

[11] B. Furht, D. Socek, A. M. Eskicioglu, “Fundamentals of multimedia encryption

techniques,” Multimedia Security Handbook. CRC Press, LLC, Ch. 3, pp. 93-131.

December 2004.

[12] B. Furht and D. Socek, “Multimedia Security: Encryption Techniques,” IEC

- 124 -

Reference

Comprehensive Report on Information Security, International Engineering

Consortium, Chicago, IL, 2003.

[13] T. Lookabaugh, D. C. Sicker, D. M. Keaton, W. Y. Guo and I. Vedula, “Security

Analysis of Selectively Encrypted MPEG-2 Streams,” Multimedia Systems and

Applications VI Conference, Orlando, FL, September 7-11, 2003.

[14] J. Meyer and F. Gadegast, “Security Mechanisms for Multimedia Data with the

Example MPEG-1Video,” Project Description of SECMPEG, Technical University of

Berlin, Germany, May 1995.

[15] T.B. Maples and G.A. Spanos, "Performance study of selective encryption scheme

for the security f networked real-time video," in Proceedings of the 4th International

Conference on Computer and Communications, Las Vegas, NV, 1995.

[16] G.A. Spanos and T.B. Maples, "Security for Real-Time MPEG Compressed Video in

Distributed Multimedia Applications," in Conference on Computers and

Communications, 1996, pp. 72-78.

[17] L. Tang, “Methods for Encrypting and Decrypting MPEG Video Data Efficiently,”

Proceedings of the 4th ACM International Multimedia Conference, Boston, MA,

November 18-22, 1996, pp. 219-230.

[18] L. Qiao and K. Nahrstedt, “A New Algorithm for MPEG Video Encryption,”

Proceedings of the 1st International Conference on Imaging Science, Systems and

Technology (CISST ’97), Las Vegas, NV, July 1997, pp. 21-29.

[19] C. Shi and B. Bhargava, “A Fast MPEG Video Encryption Algorithm,” Proceedings

of the 6th International Multimedia Conference, Bristol, UK, September 12-16, 1998.

[20] C. Shi, S.-Y. Wang and B. Bhargava, “MPEG Video Encryption in Real-Time Using

Secret key Cryptography,” 1999 International Conference on Parallel and Distributed

Processing Techniques and Applications (PDPTA'99), Las Vegas, NV, June 28 - July

1, 1999.

- 125 -

Reference

[21] A. M. Alattar, G. I. Al-Regib and S. A. Al-Semari, “Improved Selective Encryption

techniques for Secure Transmission of MPEG Video Bit-Streams,” Proceedings of

the 1999 International Conference on Image Processing (ICIP '99), Vol. 4, Kobe,

Japan, October 24-28, 1999,pp. 256-260.

[22] S. U. Shin, K. S. Sim and K. H. Rhee, “A Secrecy Scheme for MPEG Video Data

Using the Joint of Compression and Encryption,” Second International Workshop on

Information Security (ISW’99), Kuala Lumpur, Malaysia, November 1999, Lecture

Notes in Computer Science, Vol. 1729, pp. 191-201, 1999.

[23] H. Cheng and X. Li, “Partial Encryption of Compressed Images and Video,” IEEE

Transactions on Signal Processing, 48(8), 2000, pp. 2439-2451.

[24] A.S. Tosun, W. C. Feng, “Efficient multi-layer coding and encryption of MPEG

video streams,” IEEE International Conference on Multimedia and Expo, New York,

July 2000, pp. 119 – 122.

[25] J. Wen, M. Severa, W. Zeng, M. Luttrell, and W. Jin, “A Format-Compliant

Configurable Encryption Framework for Access Control of Video,” IEEE

Transactions of Circuits and Systems for Video Technology, Vol. 12, No. 6, June

2002, pp. 545-557.

[26] W. Zeng and S. Lei, “Efficient Frequency Domain Selective Scrambling of Digital

Video,” IEEE Transactions on Multimedia, Vol. 5, No. 1, March 2002, pp. 118-129.

[27] M. Wu and Y. Mao, “Communication-Friendly Encryption of Multimedia,” 2002

International Workshop on Multimedia Signal Processing, St. Thomas, US Virgin

Islands, December 9-11, 2002.

[28] L.S. Choon, A. Samsudin, R. Budiarto, “Lightweight and cost-effective MPEG video

encryption,” 2004 International Conference on Information and Communication

Technologies: From Theory to Applications, 19-23 April 2004 pp.525 – 526.

- 126 -

Reference

[29] Z. Liu, X. Li, Z. Dong, “Enhancing security of frequency domain video encryption,”

Proceedings of the 12th annual ACM international conference on Multimedia, New

York, USA, October 10-16, 2004, pp.304-307.

[30] G. Liu, T. Ikenaga, S. Goto, T. Baba, “A Selective Video Encryption Scheme for

MPEG Compression Standard,” IEICE Transactions on Fundamentals of Electronics,

Communications and Computer Sciences, Volume E89-A, Issue 1, January 2006,

pp.194-202.

[31] G. Liu, S. Goto, T. Baba, T. Ikenaga, “No bit overhead MPEG video scrambling

based on event shuffle in frequency domain,” The 2004 IEEE Asia-Pacific

Conference on Circuits and Systems, Volume 2, 6-9 Dec. 2004, pp.761-764.

[32] J. D. Wang, Y. B. Fan, T. Ikenaga, S. Goto, “No compression ratio reduction H.264

video scrambling,” Symposium on cryptography and information security,

Huistenbosch, Japan, 23-25 January, 2007, 3B3-1.

[33] C.-P. Wu and C.-C. J. Kuo, “Fast Encryption Methods for Audiovisual Data

Confidentiality,” SPIE International Symposia on Information Technologies 2000,

Boston, MA, November 2000, pp. 284-295.

[34] C.-P. Wu and C.-C. J. Kuo, “Efficient Multimedia Encryption via Entropy Codec

Design,” Proceedings of SPIE Security and Watermarking of Multimedia Content III,

Volume 4314, San Jose, CA, January 2001.

[35] I. K. Cheong, Y. C. Hung, Y. S. Tung, S. R. Ke, W. C. Chen, “An Efficient

Encryption Scheme for MPEG Video,” International conference on consumer

electronics, 8-12 January 2005, pp.61-62.

[36] National Institute of Standards and Technology (U.S.). Data Encryption Standard

(DES). FIPS Publication 46-3, NIST, 1999.

[37] National Institute of Standards and Technology (U.S.). Advanced Encryption

Standards (AES). FIPS Publication 197, 2001.

[38] R. L. RIVEST. A. SHAMIR, AND L. ADLEMAN. A “method for obtaining digital

- 127 -

Reference

signatures and public key cryptosystems”. Communications of the ACM, 21(1978),

120-126.

[39] V. Miller, “Use of elliptic curves in cryptography”, CRYPTO 85, 1985.

[40] X. Lai, J. L. Massey and S. Murphy, “Markov ciphers and differential cryptanalysis,

Advances in Cryptology,” Lecture Notes in Computer Science, Eurocrypt 91, 1991,

pp.17-38.

[41] I. Agi and L. Gong, “An Empirical Study of Secure MPEG Video Transmission,”

Proceedings of the Symposium on Network and Distributed Systems Security, IEEE,

1996.

[42] L. Qiao, K. Nahrstedt, and I. Tam, "Is MPEG Encryption by Using Random List

Instead of Zigzag Order Secure?" IEEE International Symposium on Consumer

Electronics, December 1997. Singapore.

[43] B. Bhargava, C. Shi, and Y. Wang, “MEPG Video Encryption Algorithms”, August

2002, available at http://raidlab.cs.purdue.edu/papers/mm.ps.

[44] . Seidel, D. Socek, and M. Sramka, “Cryptanalysis of Video Encryption

Algorithms ,” to appear in Proceedings of The 3rd Central European Conference on

Cryptology TATRACRYPT 2003, Bratislava, Slovak Republic, 2003.

[45] A. Alattar and G. Al-Regib, “Evaluation of selective encryption techniques for

secure transmission of MPEG video bit-streams,” in Proceedings of the IEEE

International Symposium on Circuits and Systems, vol. 4, pp IV-340-IV-343, 1999.

[46] Iain E.G. Richardson, “H.264 and MPEG-4 Video Compression, Video coding for

next-generation multimedia,” John Wiley & Sons Ltd, 2003, pp.159-223.

[47] T. Wiegand, G.J. Sullivan, G. Bjntegaard, A. Luthra, “Overview of the H.264/AVC

video coding standard,” IEEE Transactions on Circuits and Systems for Video

Technology, Volume 13, Issue 7, July 2003, pp.560 - 576.

[48] A. Satoh, S. Morioka, K. Takano, S. Munetoh, “A Compact Rijndael Hardware

Architecture with S-Box Optimization,” Advances in Cryptology - ASIACRYPT

- 128 -

Reference

2001, 7th International Conference on the Theory and Application of Cryptology and

Information Security, Gold Coast, Australia, December 9-13, 2001, pp.239 – 254.

[49] D. Canright, “A Very Compact S-Box for AES,” Cryptographic Hardware and

Embedded Systems – CHES, September 2005, pp.441 – 455.

[50] M. Feldhofer, S. Dominikus, J. Wolkerstorfer, “ Strong Authentication for RFID

Systems Using the AES Algorithm,” Cryptographic Hardware and Embedded

Systems - CHES 2004, Volume 3156, 2004, pp.357-370.

[51] OneSeg in Japan. http://en.wikipedia.org/wiki/Oneseg

[52] P. Kocher, J. Jaffe, and B. Jun. Differential power analysis. In M. Wiener, editor,

Advances in Cryptology: Proceedings of CRYPTO’99, number 1666 in Lecture Notes

in Computer Science, pages 388–397, Santa Barbara, CA, USA, August 15-19 1999.

Springer-Verlag.

[53] S. Mangard, E. Oswald, and T. Popp, “Power Analysis Attacks: Revealing the secrets

of smart card,” published by Springer, 2007.

[54] Eric Brier, Christophe Clavier, and Francis Olivier, "Optimal Statistical Power

Analysis", Cryptology ePrint Archive, http://eprint.iacr.org/2003/152.pdf

[55] M. L. Akkar, C. Giraud, “An Implementation of DES and AES, Secure against Some

Attacks,” Proceedings International Workshop on Cryptographic Hardware and

Embedded Systems (CHES 2001), pp.309-318, 2001.

[56] M. Joye, P. Paillier, B. Schoenmakers, “On second-order differential power analysis,”

Proceedings International Workshop on Cryptographic Hardware and Embedded

Systems (CHES 2005), pp.293-308, 2005.

[57] H.264 in wikipedia. http://en.wikipedia.org/wiki/H.264

[58] Yibo Fan, Jidong Wang, Ikenaga, T. Goto, S., "Mixed bus width architecture for low

cost AES VLSI design", 7th International Conference on ASIC (ASICON), 2007,

22-25 Oct. 2007 Page(s):854 – 857.

[59] SASEBO project in Research Center for Information Security(RCIS),

- 129 -

Reference

www.rcis.aist.go.jp/special/SASEBO/

[60] Cryptographic hardware project in TOHOKU University,

http://www.aoki.ecei.tohoku.ac.jp/crypto/

[61] Yibo FAN, Jidong Wang, Takeshi Ikenaga, Yukiyasu TSUNOO, Satoshi Goto, “An

Unequal Secure Encryption Scheme For H.264/AVC Video Compression Standard

Date of Evaluation”, IEICE Transaction on Fundamentals, Vol.E91-A, No.1, pp.12-21,

Jan 2008.

[62] Yibo FAN, Jidong Wang, Takeshi Ikenaga, Satoshi Goto, “A New Video Encryption

Scheme for H.264/AVC”, Pacific-Rim conference on multimedia (PCM 2007), 2007.

[63] Jidong Wang, Yibo FAN, Takeshi Ikenaga, Satoshi Goto, “A Partial Scramble

Scheme for H.264 Video”, The 7th international conference on ASIC (ASICON 2007),

2007.

[64] Jidong Wang, Yibo FAN, Takeshi Ikenaga, Satoshi Goto, “An Efficient Encryption

Scheme for H.264 Format Video Streams,” The 20th workshop on circuits and

systems in karuizawa, 23-24 April, 2007.

[65] Yibo FAN, Jidong WANG, Takeshi IKENAGA, Satoshi GOTO, “A Survey of Video

Encryption Methods”, Proc. of the 2nd International Ph.D. Student Workshop on SOC

(IPS), pp. 17-20, Taipei, Taiwan, July 2007.

[66] Yibo FAN, Takeshi Ikenaga, Satoshi Goto, "A Low-cost Reconfigurable Architecture

for AES Algorithm", International Conference on Information and Communications

Security (ICICS 2008), Prague, Czech Republic, July 25-27, 2008.

[67] Yibo FAN, Takeshi IKENAGA, Yukiyasu TSUNOO, Satoshi GOTO, "A Low-cost

LSI design of AES against DPA attack by hiding power information", The 21th

workshop on circuits and systems in karuizawa, 2008.

[68] Yibo FAN, Takeshi Ikenaga, Satoshi Goto, “A High-speed Design of Montgomery

Multiplier”, IEICE Transaction on Fundamentals, Vol.E91-A, No.4, pp.971-977, April,

2008.

- 130 -

Reference

[69] Guoyu QIAN, Yibo FAN, Yukiyasu Tsunoo, Takeshi Ikenaga, Satoshi Goto, "FPGA

& ASIC Implementation of Differential Power Analysis Attack on AES", the 4th

International Conferences on Information Security and Cryptology, Dec. 14-17, 2008

(to be published)

[70] VDEC. http://www.vdec.u-tokyo.ac.jp/

[71] Andreas Uhl, Andreas Pommer, "Image and video encryption, from digital rights

management to secured personal communication", Springer, 1 edition November 4,

2004.

- 131 -

Publications

Publications

International Journal

[1] Yibo FAN, Takeshi Ikenaga, Satoshi Goto, “Reconfigurable Variable Block Size

Motion Estimation Architecture for Search Range Reduction Algorithm”, IEICE

Transaction on Electronics, Vol.E91-C, No.4, pp.440-448, April, 2008.

[2] Yibo FAN, Takeshi Ikenaga, Satoshi Goto, “A High-speed Design of Montgomery

Multiplier”, IEICE Transaction on Fundamentals, Vol.E91-A, No.4, pp.971-977, April,

2008.

[3] Yibo FAN, Jidong Wang, Takeshi Ikenaga, Yukiyasu TSUNOO, Satoshi Goto, “An

Unequal Secure Encryption Scheme For H.264/AVC Video Compression Standard

Date of Evaluation”, IEICE Transaction on Fundamentals, Vol.E91-A, No.1, pp.12-21,

Jan 2008.

International Conference (with review)

[1] Guoyu QIAN, Yibo FAN, Yukiyasu Tsunoo, Takeshi Ikenaga, Satoshi Goto, "FPGA

& ASIC Implementation of Differential Power Analysis Attack on AES", the 4th

International Conferences on Information Security and Cryptology, Dec. 14-17, 2008

(to be published)

[2] Yibo FAN, Takeshi Ikenaga, Satoshi Goto, "Optimized 2-D SAD Tree Architecture of

Integer Motion Estimation for H.264/AVC", 16th IFIP/IEEE international conference

on very large scale integration (VLSI-SoC 2008), Rhodes Island, Greece, Oct. 13-15,

2008.

[3] Yibo FAN, Takeshi Ikenaga, Satoshi Goto, "Fast VBSME design

using reconfigurable hardware achitecture and search range reduction algorithm", The

- 132 -

Publications

10th IASTED International Conference on Signal and Image Processing (SIP 2008),

Kailua-Kona, Hawaii, August 18–20, 2008.

[4] Yibo FAN, Takeshi Ikenaga, Satoshi Goto, "A Low-cost Reconfigurable Architecture

for AES Algorithm", International Conference on Information and Communications

Security (ICICS 2008), Prague, Czech Republic, July 25-27, 2008.

[5] Yibo FAN, Jidong Wang, Takeshi Ikenaga, Satoshi Goto, “A New Video Encryption

Scheme for H.264/AVC”, Pacific-Rim conference on multimedia (PCM 2007), 2007.

[6] Yibo FAN, Jidong Wang, Takeshi Ikenaga, Satoshi Goto, “Mixed Bus Width

Architecture for Low Cost AES VLSI Design”, The 7th international conference on

ASIC (ASICON 2007), 2007.

[7] Jidong Wang, Yibo FAN, Takeshi Ikenaga, Satoshi Goto, “A Partial Scramble

Scheme for H.264 Video”, The 7th international conference on ASIC (ASICON 2007),

2007.

[8] Yibo FAN, Jidong WANG, Takeshi IKENAGA, Satoshi GOTO, “A Survey of Video

Encryption Methods”, Proc. of the 2nd International Ph.D. Student Workshop on SOC

(IPS), pp. 17-20, Taipei, Taiwan, July 2007.

Domestic Conference (with review)

[1] Yibo FAN, Takeshi IKENAGA, Yukiyasu TSUNOO, Satoshi GOTO, "A Low-cost

LSI design of AES against DPA attack by hiding power information", The 21th

workshop on circuits and systems in karuizawa, 2008.

[2] Yibo FAN, Takeshi Ikenaga, Satoshi Goto, “A High-Speed Design of Montgomery

Multiplier,” The 20th workshop on circuits and systems in karuizawa, 23-24 April,

2007.

[3] Jidong Wang, Yibo FAN, Takeshi Ikenaga, Satoshi Goto, “An Efficient Encryption

Scheme for H.264 Format Video Streams,” The 20th workshop on circuits and

systems in karuizawa, 23-24 April, 2007.

- 133 -

Publications

Domestic Conference (without review)

[1] Yibo FAN, Jidong WANG, Takeshi IKENAGA, Yukiyasu TSUNOO, Satoshi GOTO,

"Hardware Evaluation of eSTREAM Stream Cipher Candidates in Phase 3 Profile 2:

Moustique, Pomaranch and Decim v2", Symposium on Cryptography and

Information Security (SCIS), 2008.

[2] Yibo FAN, Xiaoyang Zeng, Takeshi Ikenaga, Satoshi Goto, "Hardware Reuse

Architecture for High-Radix Scalable Montgomery Multiplier", 2E2-1, Symposium

on Cryptography and Information Security (SCIS2007), Jan. 2007.

[3] Jidong Wang, Yibo FAN, Xiaoyang Zeng, Takeshi Ikenaga, Satoshi Goto, "No

Compression Ratio Reduction H.264 Video Scrambling", 3B3-1, Symposium on

Cryptography and Information Security (SCIS2007), Jan. 2007.