tamper proof certiﬁcation system based on secure non ... · fpgas can be categorized into four...

Tamper proof certification system based on securenon-volatile FPGAs

Diogo Alcoforado da Gama de Oliveira Parrinha

Thesis to obtain the Master of Science Degree in

Electrical and Computer Engineering

Supervisor(s): Prof. Ricardo Jorge Fernandes ChavesProf. Leonel Augusto Pires Seabra de Sousa

Examination Committee

Chairperson: Prof. Gonçalo Nuno Gomes TavaresSupervisor: Prof. Ricardo Jorge Fernandes Chaves

Member of the Committee: Prof. Fernando Manuel Duarte Gonçalves

November 2017

Acknowledgments

I would like to start by thanking the constant support from my family and everything they did for me,

which allowed me to close this chapter of my life. Without them, this would have been much harder. A

special thanks to my mother Marina and my father Ricardo.

Throughout the years I spent in IST, I have enjoyed working with a lot of people, from colleagues

to professors. I have made some great friends and I am happy to realize that we have spent amazing

moments together. However, I would like to offer a particular thanks to Diogo Prata for being a good

friend throughout the degree and for overcoming many common adversities together.

Finally, I would like to extend my sincere thanks to my supervisor Prof. Ricardo Chaves, for his con-

tinuous support and guidance throughout this project. His technical expertise and constant motivation

have helped me to conclude this thesis.

May this be the start of a new beginning.

Thank you!

iii

Resumo

Os sistemas embebidos suportados por FPGAs têm um papel cada vez maior em sistemas crı́ticos e

de segurança. Um exemplo particular destes sistemas são os Módulos de Segurança em Hardware

(HSM), que fornecem gestão e utilização de chaves privadas, de modo seguro e confiável. Contudo, os

sistemas que estão disponı́veis comercialmente são demasiado caros e limitados nas funcionalidades

disponibilizadas. Por outro lado, as soluções baseadas em FPGAs voláteis que existem até à data, não

são adequadas para a criação de um Módulo de Segurança em Hardware, pois não contêm as carac-

terı́sticas de segurança necessárias, como funcionalidades anti-adulteração, gestão de chaves interna

segura e capacidade de prevenir clonagem. Neste trabalho, é proposto um HSM que seja de código

aberto, de baixo custo, reconfigurável e altamente flexı́vel. O sistema é suportado por um System-

on-Chip que contém uma FPGA não-volátil, com diversos serviços e caracterı́sticas de segurança. A

solução apresentada opera como um sistema de certificação versátil, capaz de providenciar gestão se-

gura de chaves, assinaturas digitais e de emitir certificados digitais confiáveis, suportando uma interface

PKCS#11 com funções adicionais. Para melhor ilustrar a flexibilidade da solução proposta, um caso-de-

uso, denominado Log-Chain, é também proposto e implementado. O Log-Chain consiste numa cadeia

de logs que pode ser incrementada e verificada, não podendo ser modificada ou repudiada. Os resulta-

dos experimentais sugerem que o sistema consegue calcular até 2 operações de assinatura/certificação

por segundo, com uma abordagem de baixo custo, adaptável e segura.

Palavras-chave: FPGA não-volátil, Módulo de Segurança em Hardware, Sistema de Certificação,Microsemi Smartfusion2 SoC

v

Abstract

Embedded systems supported by FPGAs are increasingly playing a bigger role in safety-critical areas.

A particular example of such safety-critical systems are Hardware Security Modules (HSM), which pro-

vide private key management and usage, in a secure and reliable way. However, commercially available

systems are too expensive and limited in the provided functionality. On the other hand, existing volatile

FPGA solutions do not adequately provide the needed security characteristics, such as anti-tampering

features, secure internal key management and anti-cloning capabilities. Herein, an open-source, low-

cost and highly flexible reconfigurable HSM is proposed, supported by a System-on-Chip with a non-

volatile FPGA that contains several security characteristics and services. The presented solution oper-

ates as a versatile certification system that provides secure key management, digital signatures services

and is able to issue trustworthy certificates, using an extended PKCS#11 interface. To further illustrate

the flexibility of the proposed solution, a Log-Chain certification use-case is also presented, which con-

sists of a chain-of-logs that can be incremented and verified, but cannot be repudiated or modified.

Experimental results suggest that the system is able to compute up to 2 sign/certification operations per

second with a low-cost, adaptable, and secure approach.

Keywords: Non-volatile FPGA, Hardware Security Module, Certification System, MicrosemiSmartfusion2 SoC

vii

Contents

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Resumo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

1 Introduction 1

1.1 Objectives and Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Main contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 5

2.1 Cryptographic Services and Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Symmetric Key Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.2 Asymmetric Key Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.3 Hashing Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.4 Secret Key Establishment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.5 Digital Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.6 Key Certification and PKI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.7 Physically Unclonable Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Secure Computing Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Implementation Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Smartfusion2 SoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.1 Device Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4.2 Security Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 State of the Art 19

3.1 FPGA as Secure Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Key Generation and Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3 Full Security Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

ix

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Proposed Solution 27

4.1 Users and Key Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Communication and Session Establishment . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 Log-Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Implementation 37

5.1 Device Configuration and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2 Cryptographic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.3 Key Generation and Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.4 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.5 Log-Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.6 Communication Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.7 Middleware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.8 Simple Time Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6 Results 51

6.1 Cryptographic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.1.1 SHA-256 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.1.2 AES-256 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.1.3 EC Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.2 System Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.3 Communication Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.4 Comparison with the State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7 Conclusions 61

7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Bibliography 63

A Communication Protocol 67

x

List of Tables

2.1 X.509v3 certificate fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Single-threaded performance (signatures/second) for different HSMs [12]. . . . . . . . . . 13

2.3 HSM Key Storage capacity [12]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Protection mechanisms for FPGA configuration data. . . . . . . . . . . . . . . . . . . . . . 14

2.5 Key Features for Secure Hardware [4]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1 Comparison of Security Features of the different system proposals. . . . . . . . . . . . . . 25

3.1 Comparison of Security Features of the different system proposals. . . . . . . . . . . . . . 26

4.1 Key generation and storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Secure session establishment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 Available Device commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.1 Non-volatile memory usage requirements for the implemented system. . . . . . . . . . . . 43

5.2 Additional API functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.3 Supported official PKCS#11 API functions. . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.1 Operation times for the three SHA-256 implementations. . . . . . . . . . . . . . . . . . . . 52

6.2 Operation times for the two AES-256 implementations. . . . . . . . . . . . . . . . . . . . . 54

6.3 Operation times for the three versions conceived. . . . . . . . . . . . . . . . . . . . . . . . 55

6.4 Operation times for the three versions conceived. . . . . . . . . . . . . . . . . . . . . . . . 56

xi

List of Figures

2.1 An example of an elliptic curve. Example equation: y2 = x3 + ax+ b . . . . . . . . . . . . 7

2.2 SmartFusion2 SoC FPGA Block Diagram [39]. . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Detailed security and settings model diagram [41]. The green segments in the middle are

stored in non-volatile memory. The COMBLK performs the communication between the

MSS (software) and security services (System Controller). . . . . . . . . . . . . . . . . . . 17

3.1 FPGA as a Trusted Machine [7]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Setup: TA generates a public/private RSA key pair and transfers the private key privatekf

into the bootstrapping binary (in blue: data is encrypted so it doesn’t need to go through

a secure channel) [11]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Block diagram of the Amuet architecture [10]. The embedded application is actually the

user application, protected by the proposed wrapping system. . . . . . . . . . . . . . . . . 24

4.1 The proposed overall system architecture. The light blue rectangle represents the secure

System-on-Chip. The contents of the external Flash are encrypted. . . . . . . . . . . . . . 28

4.2 Log file example. Each horizontal line represents a line break. Hashes and signatures

are Base 64 encoded. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 Structure of a Log-Chain. Each hash is computed using the previous hash of the log. The

first hash is defined by the device administrator (UID=0) and set as the root hash value. . 33

4.4 The structure of a log chain with grouped log entries. . . . . . . . . . . . . . . . . . . . . . 34

4.5 The structure of log folders and ther files. The current year and month folders are high-

lighted in dark grey and the current day log file is highlighted in yellow. . . . . . . . . . . . 34

5.1 The first system architecture, which uses the mbedTLS algorithms to perform crypto-

graphic operations. The unused modules are greyed out. . . . . . . . . . . . . . . . . . . 39

5.2 The second system architecture, which uses the SoC embedded cores for additional se-

curity and possible performance. The unused modules are greyed out. . . . . . . . . . . . 40

5.3 The third system architecture, which uses the FPGA to accelerate the SHA-256 algorithm.

The unused modules are greyed out. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.4 SRAM-PUF core example for Key Code 2 and 3. . . . . . . . . . . . . . . . . . . . . . . . 42

5.5 Development internal mode memory map. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.6 Production mode internal memory map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

xiii

5.7 The scheme for time synchronization via STS. . . . . . . . . . . . . . . . . . . . . . . . . 49

6.1 SHA-256 throughput for the three tested implementations. . . . . . . . . . . . . . . . . . . 52

6.2 AES-256 throughput for the two tested implementations. . . . . . . . . . . . . . . . . . . . 54

6.3 System operation times for the three tested implementations. . . . . . . . . . . . . . . . . 55

6.4 Throughputs for open-channel and secure-channel communications. . . . . . . . . . . . . 57

A.1 Flowchart describing the process of receiving a message through the created communi-

cation protocol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

xiv

Acronyms

AES Advanced Encryption Standard.

ASIC Application-Specific Integrated Circuit.

DPA Differential Power Analysis.

EC Elliptic Curve.

ECC Elliptic Curve Cryptography.

ECDH Elliptic Curve Diffie-Hellman.

ECDSA Elliptic Curve Digital Signature Algorithm.

ECIES Elliptic Curve Integrated Encryption Scheme.

eNVM embedded Non-Volatile Memory.

eSRAM embedded Static Random Access Memory.

FPGA Field-Programmable-Gate-Array.

HMAC Hash-based Message Authentication Code.

HSM Hardware Security Module.

IV Initialization Vector.

MSS Microcontroller Subsystem.

NTP Network Time Protocol.

PC Personal Computer.

PKI Public Key Infrastructure.

PUF Physically Unclonable Function.

RSA Rivest-Shamir-Adleman.

xv

SoC System-on-Chip.

STS Simple Time Service.

TPM Trusted Platform Module.

TRNG True Random Number Generator.

UID User Identification.

xvi

Chapter 1

Introduction

Modern reconfigurable systems, such as Field-Programmable-Gate-Array (FPGA), provide increasing

programming possibilities, high flexibility and growing hardware capabilities. For these reasons, there

has been an expanding variety of applications for these devices, such as Data Centers, Medical, Aerospace,

Defense, Security, Transportation and Automotive [1]. Along with this, the increasing need for data pro-

tection and system reliability, especially for safety-critical systems, has urged FPGA manufacturers to

develop more secure and reliable devices, rather than solely focusing on power consumption and system

performance.

FPGAs are low-cost general-purpose devices that provide high flexibility and performance. They are

composed of a configurable logic block array, connected through programmable interconnections. Their

configuration is usually described using a hardware description language, such VHDL or Verilog, and

can be configured for the desired application after manufacturing.

FPGAs can be categorized into four different categories depending on their configuration storage:

SRAM-based, SRAM-based with internal flash, Flash-based and Antifuse-based. SRAM-based FPGAs

have their logic cells configuration data stored in the static memory cells. These FPGAs must be re-

programmed on each start since SRAM is volatile. They read the configuration from an external source

(e.g. Flash memory) when the device is booted. When an internal flash memory is present, the bit-

stream is stored internally, which prevents unauthorized bitstream copying (SRAM-based with internal

flash). Most modern volatile FPGAs come with a secure boot process, in which the device will attempt to

load the binary from the configuration memory when powered on. The binary is decrypted and authen-

ticated using the onboard dedicated decryption logic and the programmed AES key (by the hardware

manufacturer). This key can only be read by the internal decryption logic and is not accessed from the

outside. If the configuration bitstream is not authenticated, the device gets to an error state and will not

function until provided with a valid bitstream.

On the other hand, Flash-based FPGAs use an internal flash memory for the configuration storage,

rather than static memory cells. Non-volatile FPGAs provide higher security, faster logic availability

after power-on, and of course, non-volatile storage, which is of key importance for safety-critical ap-

plications [2, 3]. Furthermore, non-volatile FPGAs tend to consume less power and are more tolerant

1

to radiation effects. Because they are non-volatile, the bitstream is not at risk of being probed during

start-up [3]. Finally, Antifuse-based FPGAs consist of “fuse-burning”, which means they can only be

programmed once.

Existing commercial security-oriented devices provide cryptographic operations and secure key man-

agement with an adequate performance but at a high cost and low flexibility (e.g. Hardware Security

Module), or low performance because of small computation power and memory storage but at a lower

price and higher flexibility (e.g. SmartCard). Unlike these, new FPGA technologies are starting to pro-

vide great flexibility and security at a low price [4, 5, 3], allowing for the creation of cheaper systems that

resemble a Hardware Security Module (HSM) with much greater flexibility and the ability to be easily

reconfigurable. Moreover, FPGAs are being integrated in System-on-Chip (SoC) designs, merging em-

bedded CPUs, memories and security modules with the FPGA fabric, allowing for the creation of more

robust and security-oriented systems.

Over the last years, several authors have proposed FPGA-based architectures as secure computing

platforms [6, 7], as well as PUF-based key generation and re-keying mechanisms on FPGAs [8, 9].

Moreover, full systems have been proposed, which use FPGAs to perform security operations for safety-

critical applications, such as a Secure Application Wrapper which performs secure system authentication

and data transfer with an external memory [10], and an FPGA-based architecture that allows users to

offload sensitive computations to the cloud [11].

However, the State of the Art solutions cannot be used as Hardware Security Modules, as they solve

very specific problems and do not meet certain requirements that are mandatory for an HSM to have [12],

such as anti-cloning mechanisms (e.g. PUF-based key generation), secure communication channels,

internal non-volatile memories for master key storage, anti-tamper mechanisms, internal clock freshness

(e.g. through a Timestamping Authority) and a common developer interface, such as PKCS#11. Fur-

thermore, these works use SRAM-based FPGAs, which are subject to several attacks, such as probing

when the configuration bitstream is loaded at boot time [3, 5]. Non-volatile FPGA technologies provide

lower power consumption, faster boot times and do not need to be re-configured on each power-on.

1.1 Objectives and Requirements

The main objective of this work, is to create a re-configurable and flexible HSM, supported by a low-cost

non-volatile security-oriented FPGA, as opposed to the State of the Art. The low price implies certain

limited hardware specifications, such as reduced internal memory storage and short endurance, mean-

ing that is not possible to install an Operating System that is stored and executed only inside the device.

Moreover, the expected low CPU processing speed suggests a relatively low performance, while the

lack of internal battery implies that an internal Real Time Clock cannot be relied on, unless it is securely

initialized. Therefore, the work herein considered aims to overcome these limitations, providing a fully

working solution that tries to compete with existing commercial ones in terms of security features and

standards, while maintaining the same flexibility and low-cost as the academic re-configurable propos-

als.

2

The requirements for the considered solution, in order to achieve the aforementioned objectives are

as follows:

1. Secure key management and storage should be guaranteed through the use of a device dependent

key generation mechanism, enhancing the anti-cloning characteristics. Additionally, internal key

storage should be supported.

2. The selected device should guarantee a secure boot process.

3. As the system may be used under insecure environments, the system should be capable of estab-

lishing a secure channel with the PC, guaranteeing confidentiality, integrity, freshness and authen-

tication of the exchanged messages.

4. Internal clock freshness and synchronization should be maintained with the outside world through

the use of a reliable external time provider.

5. Externally stored data should be properly protected.

6. The selected device should have several tamper detection mechanisms and anti-tamper protection

features.

7. The developed system should be flexible, open-source and be available at a low cost.

1.2 Main contributions

In order to achieve the objectives of the work, a solution was proposed and implemented, which satisfies

the requirements highlighted above. The proposed solution, considering the Smartfusion2 SoC as the

supporting non-volatile technology, consists of creating an open-source Hardware Security Module sup-

ported by a reconfigurable technology, as opposed to existing commercial solutions. The system itself

consists of a low-cost tamper-proof and unclonable secure certification system capable of generating

and managing keys securely, while still providing high flexibility and adaptability.

The system has the ability to issue digital certificates and generate key pairs for its users, as well

as generate digital signatures upon request by an authenticated user. As the system provides high

flexibility and because of the existing need for a secure logging system, a complementary novel feature

is proposed, which consists of creating a non-repudiable and certified chain-of-logs with the secure

computation system (e.g. for Linux Syslog messages, Transaction logs or Medical Receipts [13]).

Regarding the existing State of the Art [7, 9, 8, 10, 11], the proposed solution contributes with im-

proved key management supported by a PUF-based mechanism, secure and authenticated external

data storage, a secure communication channel that assures confidentiality, integrity and authentication

of exchanged data, as well as the ability for developers to integrate applications with the system through

an extended PKCS#11 interface. Moreover, the proposed solution considers the use of a non-volatile de-

vice as opposed to a volatile one, and more specifically, a security-oriented device that contains several

characteristics that makes it possible to create a flexible and reconfigurable Hardware Security Module

at a low cost.

3

Additionally, this work also contributes with a thorough analysis of the cryptographic operations per-

formed by the device’s embedded cores, such as AES-256, SHA-256 and Elliptic Curve scalar multi-

plication. The results show that the performance of the system is primarily influenced by the Elliptic

Curve scalar multiplication operation, with 70% to 95% of the operations time being spent on scalar

multiplications. Additionally, a SHA-256 core was deployed in the FPGA fabric to understand the perfor-

mance impact over the existing embedded cores and the software implementation. The conducted tests

suggest that the FPGA-accelerated version of SHA-256 is faster than the embedded device SHA-256

core and software-based implementations, while consuming only 5% of the FPGA fabric. On the other

hand, the software-based version of AES-256 is faster than the embedded AES core provided by the

device. Overall, the system is able to perform up to 2 signature/certification operations per second, on

a non-volatile device, at a much lower cost than existing commercial HSMs, while providing the needed

security and reliability features.

An article discussing the different implementations of the proposed solution and their results has

been submitted and accepted to the International Conference on Reconfigurable Computing and FPGAs

(ReConFig 2017).

• Diogo Parrinha and Ricardo Chaves, ”Flexible and Low-Cost HSM based on Non-Volatile FPGAs”,

International Conference on Reconfigurable Computing and FPGAs (ReConFig’17), September

2017.

1.3 Thesis Outline

The thesis is organized as follows. In Chapter 2, a background study on cryptographic services is pro-

vided, along with a review of existing secure computing platforms and their implementation technologies.

Afterwards, in Chapter 3, the relevant State of the Art is presented, with a major focus on secure FPGA

computing platforms, including a comparative analysis of the various solutions. Chapter 4 and 5 detail

the proposed solution and the resulting implementation, respectively. The result analysis and perfor-

mance comparison is presented in Chapter 6. Finally, Chapter 7 concludes this document with some

final remarks and future work directions.

4

Chapter 2

Background

In this section, a brief introduction to a variety of concepts used throughout the dissertation is provided,

which includes the presentation of cryptographic services and mechanisms, such as symmetric and

asymmetric cryptography, Physically Unclonable Function (PUF), Public Key Infrastructure (PKI), key

exchange protocols (such as ECDH) and data signing mechanisms (such as ECDSA). Furthermore,

several Secure Computing Platforms (Smart Card, TPM, HSM) and Implementation Technologies (ASIC,

FPGA) are introduced and compared, along with a thorough description of the Microsemi Smartfusion2

SoC.

2.1 Cryptographic Services and Mechanisms

Cryptography services and mechanisms include symmetric cryptography (AES), asymmetric cryptogra-

phy (RSA, ECC-based), hashing functions (SHA-256), key exchange protocols (ECDH) and data signing

algorithms (ECDSA), as well as Public Key Infrastructures. Symmetric cryptography involves the use of

a shared common key between multiple parties, and is faster than asymmetric cryptography. The latter

involves a pair of keys (public and private) and is usually used to provide authentication or encryption

between two parties, giving the ability to ensure non-repudiation1 if used correctly. These keys can be

used in key exchange protocols for two parties to establish a symmetric key for secure communication

or to be used by data signing algorithms which allows a user to sign a piece of data using a private

key and another user to validate it using the signer’s public key. Since public keys must be published

and verified in a trusted manner, the PKI, that consists of a framework to perform the management of

digital certificates that bind public keys to users, is also addressed. The PKI provides mechanisms for

distributing public keys, verifying and revoking them when their private counterpart is compromised.

2.1.1 Symmetric Key Cryptography

Symmetric Key Cryptography is composed of symmetric-key algorithms for ciphering and deciphering

data using the same cryptographic keys. They represent a shared secret between two or more parties,

1The ability to ensure that a party to a contract cannot deny the authorship of a document.

5

which is used to maintain a secure and private link. Although faster than Asymmetric Key Cryptography

algorithms, it requires that the parties share the same secret.

Currently, the main symmetric encryption standard is the Advanced Encryption Standard (AES) algo-

rithm, a 128-bit iterative and symmetric block cipher which can support key sizes of 128, 192 or 256 bits

for 10, 12 or 14 rounds respectively [14]. A round consists of multiple processing steps including substi-

tution, transposition, mixing of the plain-text and transformation into the final output, i.e. the cipher-text

[14]. AES can be used with different block cipher modes of operation, which include ECB, CBC, OFB,

OCB and CTR [15]. Since OFB, OCB and CTR allow to encrypt bit by bit, they can be used as stream

ciphers, that consist of a method in which a cryptographic key and algorithm are applied to each individ-

ual bit of the plaintext. Usually, the cipher modes require an Initialization Vector (IV) that is mixed with

the data to achieve semantic security2. It consists of a fixed-size variable that should be non-repeating

and randomly generated.

2.1.2 Asymmetric Key Cryptography

Asymmetric Key Cryptography, also known as Public Key Cryptography, includes any system that uses

a pair of keys: a public key and a private key. The private key is only known to or usable by the owner,

while the public key can be known to everyone. This provides two possible features: authentication,

which is when someone uses the public key to verify the sender of a message, and confidentiality, which

is when someone uses the public key to ensure that only the owner of the private key is able to decipher

the message.

Rivest-Shamir-Adleman (RSA) is one of the oldest but most used public key cryptography algorithms.

It is based on the assumption that factoring the product of large prime number is a computationally hard

task to do. Meaning that even if an attacker has enough computational resources and time, it will still not

be able to obtain the private key.

To create a public and a private key, it is necessary to generate two different random prime numbers

p and q first. Then, compute n such that n = p ∗ q. Afterwards, use the Euler’s totient φ(n) to compute

φ(n) = (p− 1) ∗ (q − 1). With them, it is possible to choose a random integer e that meets 1 < e < φ(n)

and finally compute the integer d which verifies: e ∗ d ≡ 1 mod φ(n). The public key is (e, n) and the

private key is (d, n). With these, it is possible to compute the cipher-text c of the message m (2.1), using

the public key, and decrypt it with the private key by (2.2):

c ≡ me mod n (2.1)

m ≡ cd mod n (2.2)

An alternative to RSA is Elliptic Curve Cryptography (ECC), based on the mathematics behind elliptic

curves (see Figure 2.1) over finite fields, that can be applied to encryption, digital signatures and pseudo-

random generators. An elliptic curve is represented as a looping line intersecting two axes, and ECC

hinges on a particular type of equation created from a mathematical group derived from points where

2If an attacker possesses the ciphertext of a message and the message’s length, it cannot determine any partial informationon the message with higher probability than if it only possesses the message length.

6

the line intersects those axes [16] [17]. By multiplying a point on the curve by a number, another point

on the curve is obtained. However, it is computationally infeasible to find which number was used, even

if the original point and result is known. An example equation of a possible curve is y2 = x3 + ax+ b.

Figure 2.1: An example of an elliptic curve. Example equation: y2 = x3 + ax+ b

When using elliptic curves for public key encryption, a public and private key pair must be generated:

d and Q, in which d (random integer chosen from {1, ..., n − 1} where n is the order of the subgroup)

represents the private key and Q its public counterpart, that is generated from Q = dG (where G is the

base point of the subgroup). The definition and selection of the curve parameters, such as base point G,

elliptic curve coefficients (a,b) and order of the subgroup n, goes beyond the scope of this work, but each

curve has its own parameters and for many curves, they’re defined in FIPS 186-4 [18]. Most importantly,

ECC uses two main operations: point addition and point multiplication, in which the former involves the

public key on most algorithms while the latter involves mostly the private key. These can then be used

to establish a common shared secret between two parties and to sign data using the private key.

Moreover, ECC is considered to be faster than RSA and its keys have a shorter length than its RSA

equivalent for the same security level [16] [17]. This also makes ECC more suitable for embedded

systems and systems with lower performance and memory capacity.

2.1.3 Hashing Function

A hashing function maps an arbitrary long message to a fixed-size hash value. Ideally, they have four

main properties: hashes are quickly generated; it is infeasible to generate a message from its hash value

except by trying all possibilities; a small change in the message should change the hash extensively; it is

infeasible to find two different messages with the same hash value (collision resistant). Digital signatures

and message authentication codes (MAC) make use of hashing functions.

With SHA-256, the message is first padded so that its length is a multiple of 512 bits and then parsed

7

into 512-bit message blocks, M1,M2, ...,MN . Then each block is processed one at a time, beginning

with a fixed initial hash value H0:

Hi = Hi−1 + CMi(Hi−1), (2.3)

where C is the SHA-256 compression function and + represents the word-wise mod 232 addition. HN

is the hash of M. Hashes can typically be used to provide integrity and authentication, such as in a

Hash-based Message Authentication Code (HMAC), which involves a hash function and a secret key.

2.1.4 Secret Key Establishment

A secure communication can be established with several techniques, with the goal of creating a commu-

nication channel that provides confidentiality and authentication of information, as well as authentication

of parties. The following presents the Diffie-Hellman (DH) Key Exchange, followed by an ECC-based

DH protocol (ECDH) and finally ECIES, which uses ECDH as part of its hybrid scheme.

Diffie-Hellman

One of the most common secret key establishment protocols is the Diffie-Hellman Key Exchange (D-H),

which establishes a cryptographic secret over a public channel. The idea behind this method is that the

public information exchanged between two parties, is used to create a secret key without compromising

it. The algorithm relies on the difficulty of solving the discrete logarithm problem. First, Alice and Bob

agree to use a certain modulus p (prime) and a base g (primitive root module p), which can be public.

Afterwards, Alice and Bob generate a random value, a and b, calculate A and B respectively, and finally

exchange A and B between them:

A = ga mod p (2.4)

B = gb mod p, (2.5)

Finally, they compute:

s = Ba mod p = Ab mod p (2.6)

where s is the resulting shared secret. The can provide perfect forward secrecy3.

Unless the key pairs are not ephemeral, this does not provide authentication of either party, and

therefore it is subject to Man-in-the-Middle attacks, where an attacker intersects the exchanged public

keys, generates its own key pair, and exchanges its public key with both parties, pretending to be the

other party. Therefore, the attacker ends up with two shared secrets, one to communicate with each

party.

In some cases, it is possible to define a previously shared password that is used to cipher the ex-

changed public keys, guaranteeing that only those parties will be able to use them to generate the3If any long-term key is compromised, it does not compromise all past session keys. In the D-H key exchange, if the private

values are obtained randomly for each session (thus being ephemeral), compromising one of them will not compromise any of theother previously exchanged secrets.

8

shared secret, therefore providing authentication. This may not be feasible in all cases, as the password

needs be known, and securely stored, before establishing the communication.

However, in some situations, it is only required for one of the parties, such as a server, to be authen-

ticated before starting a secure connection. In that case, the server can have a non-ephemeral key pair,

whose public key has been distributed correctly via a PKI (more details in Section 2.1.6). The connecting

party, or client, can then authenticate the other one because it knows its public key already and trusts it.

As long as the client generates a new key pair for each session, perfect forward secrecy is still ensured.

Because the server can be authenticated, Man-in-the-Middle attacks can no longer be performed, as

the client rejects any shared secrets that do not match the one calculated using the known server public

key.

ECDH

The protocol Elliptic Curve Diffie-Hellman (ECDH) , a derivation of the DH protocol supported by elliptic

curves, allows for two parties to establish a shared secret over an insecure channel by simply exchanging

their Elliptic Curve public keys. First, both parties have to agree on the same domain parameters (i.e.

the same curve) and then generate a key pair (d,Q) accordingly, in which d and Q represent the private

and public keys respectively.

Once both parties have generated their private and public keys, they can exchange their public elliptic

curve keys. Party A calculates S = dAQB and party B calculates S = dBQA, in which S is the shared

secret, that cannot be obtained by an attacker because it only knows the public keys. The shared secret

is hashed using SHA-256 to obtain a 256-bit key.

ECIES

Elliptic Curve Integrated Encryption Scheme (ECIES) is a hybrid encryption scheme that provides se-

mantic security [19], which uses the following functions: key agreement, key derivation function, hashing,

encryption and message authentication. According to [20], there are several versions of ECIES. A simple

one basically consists of a key agreement protocol, such as ECDH, followed by a key derivation function

(KDF), whose resulting key is used in a symmetric encryption scheme (AES). The key derivation function

can be the PBKDF2, defined in PKCS#54, which supports key expansion, that can be required when

using large keys.

2.1.5 Digital Signatures

A Digital Signature consists of a mathematical technique used to validate the authenticity and integrity

of a digital message or document, assuring to the recipient that the sender cannot deny its authorship

(non-repudiation) and that the message was not altered while in transit (authentication).

4PKCS#5 is a password-based cryptography specification [21] which covers key derivation functions, encryption schemes andmessage authentication schemes.

9

An example of a Digital Signature algorithm is the Elliptic Curve Digital Signature Algorithm (ECDSA),

which allows an entity to digitally sign a piece of data using its private elliptic curve key. The data is first

hashed using a hashing function, such as SHA-256. The resulting message digest must be truncated,

so that the length of the hash is the same as the bit length of n (the order of the EC subgroup). The

truncated digest value is an integer denoted as z. The algorithm works as follows:

1. Choose a random integer k from {1, ..., n− 1}.

2. Calculate the point P = kG, where G is the base point.

3. Calculate the number r = xP mod n (where xP is the coordinate x of the point P)

4. If r = 0 choose another k and try again.

5. Calculate s = k−1(z + rdA) mod n, where dA is the private key of the signer (A).

6. If s = 0 choose another k and try again.

If successful, the pair (r, s) is the signature. Other parties can verify the signature by using the

signer’s public key (QA) to:

1. Calculate integer u1 = s−1z mod n.

2. Calculate integer u2 = s−1r mod n.

3. Calculate point P = u1G+ u2QA, and obtain xP .

4. The signature is valid if r = xP mod n.

2.1.6 Key Certification and PKI

Public Key Cryptography provides non-repudiation to secure communications. When Alice sends data

signed with its private key, Bob knows only Alice was able to sign it because only Alice has its private

key. However, Bob must trust that the received public key is in fact the public key of Alice. To solve

this problem, Digital Certificates are used. Digital certificates are electronic credentials used to verify

identities of individuals and machines, by guaranteeing that the identification of a user and data is bound

to a certain public key. In general, digital certificates consist of three main parts: user/device information;

public key; digital signature. A frequently used certificate format is the X.509v3 profile. Some of the fields

present in a X.509v3 certificate include [22]:

Table 2.1: X.509v3 certificate fields.

Version Number Serial Number

Signature Algorithm ID Issuer Name

Validity period Subject name

Subject Public Key Info Issuer Unique Identifier (optional)

Subject Unique Identifier (optional) Extensions (optional)

Certificate Signature Algorithm Certificate Signature

10

Because there must be services to issue, validate and revoke these certificates, the PKI was cre-

ated. A PKI is a framework of roles, policies and procedures that allow the generation, management,

distribution, revoking and storage of digital certificates [23, 24, 25].

A PKI normally has the following components: Security Policy, Certification Authority, Registration

Authority and Certificate Repository. The Security Policy is essential to state how the organisation

handles keys and valuable information. The Certification Authority is the entity which issues (binds

the identity of a user to a public key with a digital signature) and revokes certificates. To ensure a

certain level of trust, users that wish to receive a certificate for a public key, must first register with

a Registration Authority, which is the interface between the user and the CA, that authenticates the

user and submits the certificate request to the CA. Finally, a Certificate Repository is required in order

to store the certificates issued and Certificate Revocation Lists (CRLs), which contain certificates that

have been revoked. Certificates can be revoked for several reasons, e.g. validity date expired, private

key compromised, failure to comply with policy requirements and misrepresentation.

2.1.7 Physically Unclonable Function

A Physically Unclonable Function (PUF) is a challenge-response mechanism in which the response to a

given challenge is dependent on a variable physical material [8]. A PUF receives an input challenge (or

stimulus) Ci ∈ C, where C is the set of all possible challenges, and outputs a response Ri ∈ R - where

R is the set of all possible responses. PUFs are based on the natural randomness that exists in the IC

(integrated circuit) used to generate the response - and cannot be controlled. This occurs due to the

random alterations during the IC fabrication process, i.e. two PUFs with the same layout result in two

different functions, so it is impossible to make two PUFs behave equally.

A PUF has four main characteristics. An input produces the same response approximately (error

correction codes are used to remove noise). Given a response, it must be difficult to find its challenge

(input). Two different challenges must produce two different responses. Two different PUFs must pro-

duce two different responses for the same challenge.

2.2 Secure Computing Platforms

Several platforms have been developed over the years to perform secure computing, which include

Hardware Security Modules, Trusted Platform Modules and Smart Cards. A brief introduction to some

of these platforms is given in this section.

HSM are application-specific devices which provide secure cryptographic key management and ac-

celerated cryptographic operations with those keys. They have the following main characteristics [12]:

1. Secure key management.

2. Secure internal and external data storage.

3. Support cryptographic operations with internal keys (such as ciphering/deciphering data and gen-

erating digital signatures).

11

4. Include anti-tampering and anti-cloning features at the physical level.

5. May include side-channel analysis protection mechanisms.

6. Contain True Random Number Generators.

7. Guarantee internal clock freshness.

8. Support secure communication with the outside world.

9. Standard developer API that allows for an easy integration with software applications.

However, their cost is usually higher than general-purpose devices (the price for a regular HSM can

go up to 35,000e [26, 27]) and they do not provide the same flexibility for the programmer. HSMs are

usually connected to a network through TCP/IP or to a computer via USB, which makes it easy to remove

or add them back.

Smart Cards are security tokens that have an embedded chip. They are designed to be tamper resis-

tant and provide security services. Although considered secure, smart cards possess slow input/output

communication and low computational processing power and memory storage. Some advantages of

using Smart Cards as Secure Computing Platforms include their low price when compared to other al-

ternatives, their portability and their flexibility (being often used as credit cards, ID cards or repositories

for personal information).

A Trusted Platform Module (TPM) is a secure cryptographic chip that integrates a secure micro-

processor with cryptographic keys and functionalities, which is normally embedded on a computer’s

motherboard. It includes capabilities like Binding, Sealing and Attestation. Binding allows data encryp-

tion using the TPM’s unique RSA key. Sealing works similarly to Binding, except that it requires the

TPM to be in a certain state in order to decrypt the data. Attestation allows a third party to verify that

the software has not been changed, by comparing the unforgeable authenticated digest of the hardware

and software configuration. The most recent version of the TPM specification (2.0) provides more cryp-

tography algorithms, such as ECC, AES, SHA-256 and HMAC. However, this version is still relatively

new (approved in 2015) and therefore not many vendors support it.

Recently, ARM and Intel have developed two different technologies that provide system-wide hard-

ware isolation for trusted software. The ARM TrustZone creates an isolated area that can be used to

guarantee confidentiality and integrity to the system, by providing code and data isolation [28]. Most

ARM platforms today have this security technology implemented. On the other hand, the Intel SGX is

a technology which allows developers to protect certain code and data from disclosure or modification

[29]. Both technologies are susceptible to several attacks as described in [30, 31].

By comparing the advantages and disadvantages of each platform described above, with the objec-

tives and requirements of this work (Section 1.1), it is possible to identify that ideally, the desired system

would comprise of a low-cost and flexible Hardware Security Module.

There are several Hardware Security Modules in the market for a variety of goals, coming with differ-

ent prices and characteristics. The criteria for choosing the right HSM for a given task includes perfor-

mance, scalability, redundancy, API support, security, supported algorithms, authentication options and

cost.

12

A review on HSMs [12] shows that models like the AEP Keyper v2, SafeNet Luna SA 4.4, Thales

nShield Connect 6000 and Ultimaco CryptoServer Se1000 support several algorithms, such as AES,

RSA, ECDSA, ECDH and SHA-2, which are available through a PKCS#11 interface. While only two

of them support elliptic curve operations, all support RSA. The single-threaded performance results for

RSA signature generation can be found in Table 2.2.

Table 2.2: Single-threaded performance (signatures/second) for different HSMs [12].Key Size (bits) Keyper v2 Luna SA 4.4 nShield CryptoServer

1,204 310 800 950 1160

2,048 110 420 570 710

4,096 13 35 150 230

Table 2.3: HSM Key Storage capacity [12].AEP Keyper v2 8000 1024-bit RSA keysSafeNet Luna SA 4.4 1200 2048-bit RSA keysThales nShield Limited on board NVRAM storage.Ultimaco CryptoServer 5000 1024-bit key pairs

Concerning key storage capacity, Table 2.3 depicts the capacity for each HSM listed previously. In

regard to backups, the listed HSMs either provide backups to dedicated external cards or remote back-

ups functionality. Furthermore, they all support time synchronization and administrator authentication

via PKCS#11 interface.

2.3 Implementation Technologies

There are three major implementation technologies that can be used to create HSM-like systems: CPUs,

ASICs and FPGAs. A Central Processing Unit (CPU) consists of a general purpose electronic chip

(such as the one inside Smart Cards) that performs basic arithmetic, logical, control and input/output

operations specified by program instructions. However, they do not contain internal memory (apart from

possible cache memories) and are not closed systems, providing no security to the application.

An Application-Specific Integrated Circuit (ASIC) is an integrated circuit built for a specific application

(in this case, secure computing). They usually include microprocessors, memory blocks (ROM, RAM,

Flash Memory) and other building blocks. Because of their specific use, its manufacturing cost is quite

high when compared to general purpose application devices. They strive to meet the EAL 4+ assurance

level regarding Common Criteria / FIPS 140-2 certification [32] [33] [34].

On the other hand, as described in Section 1, FPGAs are low-cost general-purpose devices that

provide high flexibility and performance, which can be categorized into four different categories depend-

ing on their configuration storage: SRAM-based, SRAM-based with internal flash, Flash-based and

Antifuse-based.

13

Many FPGA vendors (e.g. Achronix, Altera, Lattice, Microsemi and Xilinx) compete with each other

to provide the best FPGAs to the market. Each vendor provides different FPGA devices with distinct

technologies and equip their own security mechanisms against attacks such as Bitstream Probing, De-

cryption Key Stealing, Readback attacks, Side-channel attacks, as well as FPGA counterfeiting and

cloning [35] [36]. Table 2.4 lists the protection mechanisms available for the configuration data of the

major FPGA models in the market.

Table 2.4: Protection mechanisms for FPGA configuration data.Manufacturer Device Bitstream Encryption Authentication Technology Key Storage

Microsemi [4] IGLOO2, Smartfusion2 SoC AES-256 SHA-256 Flash PUF

Xilinx [37]

Spartan-6 AES-256 Device-DNA5 SRAM eFUSE6, volatile

Virtex-6 AES-256 SHA-256 HMAC SRAM eFUSE, volatile

Virtex-7 AES-256 SHA-256 SRAM eFUSE, volatile

Zynq-7000 AES-256 SHA-256 SRAM eFUSE, volatile

Altera [38]

Stratix II/II GX AES-256 CBC - SRAM NVM

Stratix III/IV/V AES-256 CBC - SRAM volatile, NVM

Cyclone III LS AES-256 CBC - SRAM volatile

As detailed previously, non-volatile FPGAs are better suited for safety-critical applications. In par-

ticular, the IGLOO2 and Smartfusion2 SoC (which integrates a processor and FPGA logic) provide the

best variety of design and data security features when compared to other FPGA models from different

vendors [3, 39]. In fact, among the devices highlighted in Table 2.4, the IGLOO2 and the Smartfusion2

are the only devices capable of generating and storing keys through a PUF-based mechanism, allowing

for greater anti-cloning capabilities. Additionally, they are the only ones which include embedded cryp-

tographic cores that are protected against Differential Power Analysis (DPA)7. Regarding the volatile

FPGAs, while several ad-hoc solutions have been proposed [5] to strengthen the native technology,

these devices are still vulnerable to several attacks, particularly when loading the initial configuration.

On the other hand, the IGLOO2 and the Smartfusion2 SoC non-volatile bitstream storage makes them

less susceptible to probing attacks on device boot [3].

2.4 Smartfusion2 SoC

Given that the Smartfusion2 SoC is considered to be the best option available, the following focuses

on the SmartFusion2 SoC from Microsemi. Although the IGLOO2 provides similar security, it does

not contain a microprocessor, which is necessary to run the software that controls the system. In this

section, a brief description of the device is performed, followed by an introduction of its main hardware

components, design and data security features, which together, form a Root-of-Trust that is essential to

ensure a Secure Boot.

7A type of side-channel attack, in which the attacker studies the power consumption of a cryptographic hardware device inorder to extract cryptographic secrets. A side-channel attack is any attack based on information retrieved from the physical levelof a cryptographic system.

14

2.4.1 Device Description

The Microsemi Smartfusion2 SoC contains a non-volatile FPGA, integrated with an ARM Cortex-M3 in

a Microcontroller Subsystem, including an embedded Non-Volatile Memory (eNVM) and an embedded

Static Random Access Memory (eSRAM), along with several cryptographic services [39] depicted in

Figure 2.2. The board on which the SoC is mounted, provides an SPI external flash, an Ethernet PHY

10/100 and one external RAM memory.

Figure 2.2: SmartFusion2 SoC FPGA Block Diagram [39].

There are two available power modes: full-power (normal), in which the MSS and FPGA fabric are

fully operational and the Cortex-M3 is running application code with all memory controllers enabled;

low-power, in which the Smartfusion2 is considered to be in an idle state but ready to respond to an

interrupt sourced from the MSS and the FPGA - this mode disables the majority of the Cortex-M3 logic.

2.4.2 Security Features

Microsemi defines three different abstraction layers: Secure Hardware, Design Security and Data Secu-

rity. Data Security builds on top of Design Security which is built on top of Secure Hardware. While a full

list of secure hardware features is described in [4], the most important features for secure hardware are

listed in Table 2.5, namely key management and the validation of digital signatures, which include the

encrypted loading of user and factory keys, as well as device certificates (which are bound to a specific

device) and a revocation list of stolen or scrapped devices.

15

Table 2.5: Key Features for Secure Hardware [4].

Services Features

Key Management Encrypted loading of user secret key material (both Symmetric and Asymmetric encryption

supported).

Authenticated/encrypted loading of all factory keys.

Factory keys and passcodes generated and loaded by Hardware Security Models (HSMs)

Digital Signature Vali-

dation

X.509 certificate bound to device serial number, device grading information, and device se-

cret keys.

Certificates digitally signed by factory HSMs.

Certificate revocation list for scrapped or stolen devices.

Design Security is defined as protecting the intent of the design owner, i.e. keeping the design

and bitstream keys confidential and protecting against design changes. It can also be referred to as

Intellectual Property (IP) protection. The list of available Design Security features can be found in [4].

The Smartfusion2 SoC provides True Random Number Generator (TRNG) for nonces and private ECC

key generation, and all bitstreams are encrypted with AES-256 based encryption and fully authenticated

with a 256-bit tag. To prevent back-tracking attacks, versioning is provided, disallowing the loading of

obsolete bitstreams.

For DPA protection, all security keys, protocols and ECC point multiplication services have coun-

termeasures with technology from CryptographicTM Research Inc. Microsemi states that cryptographic

services are not protected against DPA (although they are safe from timing analysis and simple power

analysis), therefore they do not recommend the use of repeated keys when the adversary can choose

the ciphertext [40].

The Smartfusion2 SoC has several security and access control policies which can be configured by

setting flash-lock bits, which are control bits that enable or disable certain features (see the circles with

crosses in Figure 2.3). The security segments in green, depicted in the middle section of Figure 2.3,

are stored on the internal non-volatile memory and are the heart of the system security as they contain

the set of keys (e.g. UEK1, UPK1, DPK) used to encrypt the bitstream configuration, unlock operations

such as read, write, and verify eNVM, enter Factory Test Mode, erase, write and verify fabric, enable

versioning updates, and restrict JTAG and SPI access.

Additionally, readback of the bitstream is always disabled. For tamper protection, the device comes

with configurable zeroization options to clear and verify volatile and non-volatile memories. It also pro-

vides redundancy in the security flash array to allow detection and reporting of faults.

As for Data Security, which they explain as protecting the information that is stored, processed or

communicated in the application executing on the FPGA, a list of features is available in [4], which

includes SHA-256, HMAC-SHA-256, SRAM-PUF, ECC and AES operations.

Another security feature of these FPGAs is the Root-of-Trust. It is described as an entity that can be

16

Figure 2.3: Detailed security and settings model diagram [41]. The green segments in the middle arestored in non-volatile memory. The COMBLK performs the communication between the MSS (software)and security services (System Controller).

trusted to always behave in the expected manner. It provides the verification of the system, software and

data integrity and confidentiality, as well as the extension of trust to internal and external entities. It is

the foundation upon which all security layers are built. In an embedded system, the Root-of-Trust works

with other system elements to ensure the main processor boots securely using only authorized code -

which extends the trusted zone to the processor and its applications. By providing the aforementioned

hardware, design and data security features, Microsemi considers the Smartfusion2 SoC to provide a

Root-of-Trust, which is essential to the Secure Boot.

A Secure Boot process (controlled by the System Controller displayed on the left of Figure 2.3)

initializes an embedded system from rest and it does that by executing trusted code, free from tampering

by an attacker. If this level of trust does not exist, another boot image could replace the original one and

allow an attacker to hijack the whole system. The validation of each stage must be performed by the

previous successful phase to ensure a chain-of-trust up to the application layer. The first phase (Phase

0), or Immutable Boot Loader, is inserted within the Smartfusion2 SoC and validated by the Root-Of-

Trust, that ensures integrity and authenticity of the code. Then, each phase is validated by the previously

trusted system, before code and execution is transferred to it.

The first pages of the eNVM are reserved for the System Controller and cannot be accessed by

the user. They store, among other things, the Device Certificate and the digest of the User portion of

the eNVM. This digest allows the Immutable Boot Loader, in Phase 0, to know whether or not Phase 1

contents have been modified. If not modified, the booting process proceeds to the next phase, otherwise

it gets halted. All the aforementioned features, when configured and used properly, allow us to deploy

17

the Smartfusion2 SoC as a Secure Computing Platform.

2.5 Summary

In this chapter, a range of concepts used throughout the dissertation were introduced. More specifically,

several cryptographic services and mechanisms were discussed, such as Symmetric and Asymmetric

Key Cryptography, which include AES, RSA and ECC. Additionally, an overview of the SHA-256 hash-

ing function was given, followed by a description of secret key establishment protocols (DH, ECDH and

ECIES). Afterwards, the concept of digital signatures, Public Key Infrastructures and Physically Unclon-

able Functions were introduced.

Moreover, the existing types of Secure Computing Platforms were discussed, including Hardware

Security Modules, Trusted Platform Modules and Smart Cards. After analysing the characteristics of

each platform, it was concluded that the platform that fulfils most of the requirements of this dissertation

are HSMs, but they are usually expensive and have very fixed designs. To understand how an HSM

can be built, a variety of implementation technologies were introduced, including CPUs, ASICs and

FPGAs, with a bigger detail being given to non-volatile FPGAs due to their more suitable characteristics

for security applications and low price.

Finally, the Smartfusion2 SoC (which integrates a microprocessor, internal memories and non-

volatile FPGA fabric) was thoroughly described, with a major highlight being given to its design and data

security features, which differentiate it from its competitors. The next chapter presents the State of the

Art proposals which attempt to create secure computation systems supported by FPGA technologies.

18

Chapter 3

State of the Art

In this section, the State of the Art is discussed in regard to secure systems based on FPGAs by

analysing how the different solutions configure their devices to act as secure modules, how their sys-

tems establish communication channels with outside parties and how key management and data storage

is performed. This section is divided into four sub-sections: FPGA as Secure Platform, Key Generation

and Storage, Related Full Systems and Comparison Analysis. The first sub-section presents the sys-

tems which consider FPGAs to be secure platforms, discussing the works that propose schemes for

dynamic reconfiguration of volatile FPGAs and the implementation of the Cipherbase secure hardware

on a volatile FPGA. Secondly, we present works that perform key generation and storage with FPGAs,

such as a PUF-based approach and a rekeying management scheme for Storage Area Networks which

uses a master enveloping key that never changes. The third sub-section discusses two architectures

whose requirements are very similar to ours: the first one uses volatile FPGAs to perform trusted cloud

computing and the one second creates a secure wrapper that embeds an FPGA application. Finally, a

comparison analysis of the major works is performed, depicting a summary of the discussed architec-

tures.

3.1 FPGA as Secure Platform

Gaj et al. [6] consider the use of embedded microprocessor cores within the FPGA to achieve bitstream

security, specifically for reconfiguration of a Xilinx Virtex-II Pro on a Xilinx ML310 board. Nonetheless,

they consider the entire board as a secure device and not just the FPGA, meaning the path between the

FPGA and the external memory is susceptible to tampering.

Arasu et al. [7] present the design of the Cipherbase secure hardware and its implementation using

FPGAs. The Cipherbase system incorporates customized trusted hardware, extending Microsoft’s SQL

Server for efficient execution of queries using both secure hardware and commodity servers, allowing

for the secure storage of data.

They choose a volatile FPGA to implement the Trusted Machine (TM). Since the logic is built from

volatile configuration memories and the binary which defines the computation is loaded at power-on

19

from external non-volatile memories, the bitstream configuration can be intercepted.

Figure 3.1: FPGA as a Trusted Machine [7].

Figure 3.1 depicts how the setup of an FPGA as a TM is performed in [7]. There’s a Trusted Authority

(TA) which is trusted by clients and the cloud operator, that is responsible for generating and maintaining

FPGA binary encryption keys. Additionally, it vets and compiles the hardware code associated with the

TM and creates the encrypted and signed binaries for each device.

They assume that the FPGA is secure and that an adversary does not have access to its internal

state, because they believe that the TM is only vulnerable to side-channel analysis. The session es-

tablishment algorithm is not mentioned and the secure bootstrapping and operation of FPGAs is not

described in their work.

3.2 Key Generation and Storage

Arasu et al. [7] present two scenarios for the generation of encryption keys for database operations in

the Cipherbase system. The simplest one consists of embedding a master key into a Trusted Machine

(TM) (programmable region of FPGA) binary and distributing to the database client. The TM would use

this key to encrypt their data with AES. The drawbacks of this approach include the need of the Trusted

Authority to generate separate binaries (each loaded onto the device) for every potential database clients

and the fact that the client is locked to using this key for all their databases.

A more sophisticated approach is suggested by the authors, in which the TM embeds an RSA pub-

lic/private master key pair. The public key is published via standard public key infrastructure techniques,

allowing clients to uniquely identify a certain FPGA. This would allow clients to negotiate AES session

keys for different database fields or different database applications. In case the keys are cached by the

TM, in a key vault, they would be encrypted using the master key (or another key defined by the TA), to

guarantee that only the TM can recover the contents of the vault.

Nabeel et. al propose [8] an approach based on Physically Unclonable Function technology to pro-

vide strong hardware authentication of smart meters and efficient key management for Advanced Me-

tering Infrastructures. They utilize the PUF on the devices to generate and re-generate the symmetric

20

keys and access level passwords for smart meters. The PUF based secret generation provides strong

protection against key leakage since the master key is never stored in memory. They implemented the

PUF feedback loop using the Xilinx’s Spartan-6 FPGA board, which is connected to a PC through a

serial port. The error correction of the PUF mechanism and cryptographic operations are done on the

PC.

Wang et al. [9] propose an FPGA based flexible and low-cost rekeying management scheme to

improve the security and reduce the processing time of rekeying processes. They claim that their system

must not only provide secure, high performance, flexible, open and standard based storage infrastructure

but also prevent the data from all kind of attacks, such as Side-Channel Attacks, eavesdropping and

man-in-the-middle attacks.

To prevent the system from being attacked, they perform key management at hardware level (FPGA),

with the software simply sending commands to the key management module (hardware). The software

sends commands, which include key backup, key recovery, key revocation and key generation, to the

key management module (hardware). The hardware, implemented on a FPGA, communicates with the

software through a PCIe interface. The internal memory stores current “active” key pairs, while the

outside flash memory backs up the keys. All involving keys are encrypted before being stored. The

needed keys are generated by RNG and digested through SHA-256 to be aligned with 256-bit length.

In a typical scheme, when re-keying takes place, all the encrypted data must be decrypted using the

old key and encrypted using the new key. They propose a new FPGA flexible and low-cost re-keying

process to avoid the decryption of stored data using the old key and encryption using the new key. Their

scheme proposes the use of a long-term enveloping key that encrypts the user’s access key, which is

used to encrypt a data encryption/decryption key, known as LUN. When the re-keying process occurs,

the new access key is generated, the LUN key is decrypted using the old access key and encrypted

back with the new access key. Finally, the generated access key is encrypted using the enveloping key.

This ensures that the stored data does not need to be decrypted and encrypted again, because the LUN

key remains the same. The following is what is stored in memory: EKaccess(kLUN ), EKenv(kaccess).

The software only stores 32-bit indexes for maintenance, which are extracted at hardware level from

the encrypted private key and the user’s access keys, and are sent out to the software by the FPGA.

Even though the authors claim that their system prevents physical attacks, their explanation is quite

vague. They state that because at software level, only a 32-bit index is used (instead of the key), it

prevents their design from physical attacks, since the attacker would need to obtain the contents of the

internal memories. If an attacker has access to the devices, it is still possible to perform side-channel

analysis unless the devices are truly protected against it.

3.3 Full Security Systems

This section presents two works that fulfil requirements similar to the ones proposed by this thesis. The

first one [11] consists of using volatile FPGAs to perform trusted cloud computing, in which protected

bitstreams are used to create a Root-of-Trust for cloud computing clients. The second one [10] proposes

21

a system architecture that wraps an embedded application on a volatile FPGA. The wrapper includes

a secure user authentication interface and cryptographic services which secure all of the embedded

application’s data transfer interfaces.

Eguro et al. describe [11] how protected bitstreams can be used to create a Root-of-Trust for the

clients of cloud computing servers. Their hardware-based approach solves the following problem: how

to secure client data and computation from both potential external attackers and an untrusted system

administrator. The system which addresses this problem uses volatile FPGAs. They are programmed

to form a flexible, independent trusted third party computing platform within the cloud infrastructure.

Their proposed system allows clients to upload their configuration data to the cloud and since cloud

administrators do not have low-level access to computation within the FPGA, it allows clients to offload

sensitive parts of their applications to these devices, avoiding potential vulnerabilities in the software

stack.

The deployment of the trusted computing nodes begins with a trusted authority (TA), which is trusted

by all clients and cloud operator. The TA generates a random symmetric encryption key symkfb and

copies it into the onboard key memory of the FPGA before the platform is delivered to the cloud operator.

After the key has been written, the FPGA can be delivered to the cloud operator and installed. Since

the FPGA comes with a secure boot process (which uses the symmetric key symkfb to decrypt and

authenticate the bitstream configuration), the authors believe the FPGA can be used as a “virtual” HSM.

The authors, however, strive to support a more sophisticated operational model that does not require

direct TA involvement for each and every bitstream. The idea is that the TA provides a single generic

bootstrapping binary for each FPGA that acts as an onboard infrastructure which receives and loads

client applications. Figure 3.2 depicts how the TA generates a private/public RSA key pair and places

the private key into the boostrapping bitstream. The public key is published via a standard PKI. Once

the TA encrypts the bootstrapping bitstream with AES using symkfb, it transfers the configuration into

the flash memory on the FPGA.

Once the bootstrap configuration is running on a FPGA in the cloud, the client can create an appli-

cation for the FPGA to handle sensitive data. The client connects to this device to load their application

securely using standard PKI, like an SSH session, in which the client uses the public key of the device

to exchange a symmetric session key sessionkf .

The attack model proposed by the authors assumes that the following operations are sufficiently

difficult and that they are effectively impossible in practice: breaking the cryptography used; loading a

binary that cannot be decrypted and authenticated properly; retrieving binary or state information on the

device from outside; altering the behaviour of the loaded binary; altering data currently on the device.

Furthermore, the authors also explain the problems that would arise from keys being compromised or

lost. They also emphasize that the immutable bootstrapping logic forms the initial Root-of-Trust for the

clients of cloud computing servers.

Graf proposes a system [10] that acts as a secure wrapper around an embedded application on a

FPGA (depicted in Figure 3.3). This wrapper (known as Amuet) creates a secure user authentication

22

Figure 3.2: Setup: TA generates a public/private RSA key pair and transfers the private key privatekfinto the bootstrapping binary (in blue: data is encrypted so it doesn’t need to go through a securechannel) [11].

interface and cryptographically secures the data interfaces accessible to the embedded application,

effectively rendering the FPGA as a black box capable of performing the task for which it was designed.

The architecture introduces a secure token-based authentication scheme (using Java’s iButton [42]) and

a FPGA-based encrypted memory controller. It is important to note that the user application which runs

within the FPGA (protected by the wrapper application proposed) can only be re-programmed at the

factory as there is no interface in the proposed system to perform reconfiguration of user applications.

For this thesis, the relevant part of this work is the way Amuet performs the secure embedding of the

application, which includes the process of retrieving the user identification information (UID) from the

iButton through a secure user interface, the modification of the UID to form a DES key and finally to use

that DES key as the key source for the encrypted memory controller (EMC).

The Authentication Control Unit (ACU) is responsible for a establishing secure communication chan-

nel for the UID transfer from the iButton. It has at its disposal, a xe mod n calculator, a set of RSA

secret keys, a SHA-1 unit and a table of authorized certificates. The protocol used to negotiate a secure

channel between the device and the iButton is similar to a RSA key exchange. However, the public key is

never made public and instead, it is used as the encryption key (e), while the private RSA key is used as

the decryption key (d). The two key pairs and the two moduli (n) used in the RSA-based authentication

scheme, are generated prior to the programming of the iButton and the creation of the FPGA bitstream.

The FPGA stores ni, nf , ei, df and the iButton stores ni, nf , ef , di.

Additionally, the certificates stored are hashes of iButton’s UIDs, which makes their input mathemat-

ically difficult to be found. Since the table resides inside the FPGA, everytime a new certificate is added

or revoked, the bitstream must be modified. To prevent man-in-the-middle attacks, the iButton authenti-

23

Figure 3.3: Block diagram of the Amuet architecture [10]. The embedded application is actually the userapplication, protected by the proposed wrapping system.

cates itself as an authorized user to the FPGA but the FPGA also authenticates itself as an authorized

host to the iButton. After a successful authentication, the FPGA sends the UID to the EMC to create the

final key for the DES engine.

The Encrypted Memory Controller (EMC) encrypts and decrypts every transaction between the em-

bedded application within the FPGA and the external memory, outside it. On startup, the EMC uses a

secret DES key, which is unique to and known only to the FPGA, to create the final DES key (it is never

used for ciphering/deciphering operations). This final key is formed by passing the UID from the iButton

through the DES engine using the secret DES key. From the 64-bit result, the first 56 bits are the final

DES key.

3.4 Discussion

In this chapter, the State of the Art was presented in regard to existing solutions that strive to create

secure systems on FPGAs, each with different motivations, as seen in the previous sections. The main

focus of these works are primarily the ability to perform secure key management, establishing secure

communication channels and storing data securely (internally and externally). Due to the lack of non-

volatile solutions, the works herein presented focused on volatile FPGAs, which are subject to several

attacks [3, 5].

To create a Hardware Security Module, a system must have a series of characteristics [12], such as

24

anti-cloning mechanisms (e.g. PUF-based key generation), secure communication channels, internal

non-volatile memories for master key storage, anti-tamper mechanisms, internal clock freshness (e.g.

through a Timestamping Authority) and a common developer interface, such as PKCS#11.

However, none of the works provide a robust solutions that is able to address all of the above char-

acteristics. In fact, the proposed solutions do not consider security-oriented devices, which contain

anti-cloning, anti-tamper and side-channel analysis protection mechanisms, neither do they consider

the necessity of internal clock freshness. Moreover, they do not consider the performance and security

of the encryption software/hardware used. Our study finds that only Nabeel et. al [8] considers the use

of a PUF-based mechanism to increase the security and authentication of the overall system.

Additionally, freshness of exchanged data with external parties is highly overlooked by all solu-

tions, which makes them susceptible to replay attacks. They lack any kind of developer interfaces (e.g

PKCS#11) to allow for an easy integration of their systems with external applications.

Table 3.1 depicts the key characteristics of each solution and the main features they provide, in

regards to device security, the ability to establish a secure communication channel and how they perform

key management.

Table 3.1: Comparison of Security Features of the different system pro-

posals.

Category Arasu et al. [7] Wang et al. [9] Nabeel et al. [8] Graf [10] Eguro et al. [11]

Device Security

Device N/D1 Xilinx

Virtex-6

Xilinx

Spartan6

Xilinx

Virtex-E,

iButton

Xilinx

Virtex-6

Bitstream Encryption AES AES-256 AES-256 3DES AES-256

Bitstream Storage External External External External External

Secure Channel

Algorithm RSA N/A2 N/A RSA-based RSA

Key Management

Master Keys Generation Factory Factory PUF Factory Factory

Master Keys RSA AES N/D DES RSA, AES

Master Keys Encryption No No N/D No No

Master Keys Storage External Internal N/D Internal Internal

Session Keys Generation RSA-KE3 RNG PUF RSA-KE RSA-KE

1Not Disclosed2Not Available3RSA-based Key Exchange protocol

25

Table 3.1: Comparison of Security Features of the different system pro-

posals.

Category Arasu et al. [7] Wang et al. [9] Nabeel et al. [8] Graf [10] Eguro et al. [11]

Session Keys AES AES AES DES AES

Session Keys Storage Internal Internal,

External

N/D Internal Internal

Considering the above discussion, it can be concluded that the State of the Art works cannot be

used to create a Hardware Security Module or a system that resembles one. Considering that HSMs are

expensive and non-reconfigurable, as mentioned in Section 2.2, the next section proposes a solution,

supported by the Smartfusion2 SoC, which creates a multi-user, flexible HSM, capable of performing

secure key management, communicating securely with external parties, guaranteeing internal clock

freshness and with the ability to sign data and issuing digital certificates. To demonstrate the flexibility

of the HSM, a use-case called Log-Chain is presented and integrated in the HSM.

26

Chapter 4

Proposed Solution

The main goal of this work is to create a re-configurable and flexible HSM, supported by a low-cost

non-volatile security-oriented FPGA, as opposed to the State of the Art. As stated previously, existing

commercial HSMs are expensive and fixed in terms of design. On the other hand, the State of the Art

proposals that attempt to create secure systems on FPGAs, focus purely on volatile devices and without

security-oriented characteristics, such as anti-tampering, anti-cloning and side-channel analysis protec-

tion. Additionally, they lack most of the requirements of an HSM, which were described in Section 2.2.

The solution herein proposed, supported by the Smartfusion2 SoC, creates a multi-user, low-cost

and highly flexible secure computation system that performs secure key management, stores data se-

curely internally and externally, can establish a secure communication with outside parties, can maintain

internal clock freshness and is capable of computing digital signatures as well as issuing digital certifi-

cates.

To demonstrate the flexibility of the proposed architecture, the implementation of a novel log cer-

tification scheme is also proposed and developed. This scheme consists in creating a signed Log-

Chain, such as the Linux Syslog messages, transaction logs or even a medical receipts log. Each

message/command signature guarantees the authenticity of that message and all previously logged

messages/commands, therefore creating a chain-of-logs that can be verified and cannot be repudiated

or modified.

The integration of the provided features with applications is done through an extended PKCS#11

middleware (device driver), abstracting the users from the inner workings of the system. The generated

tamper proof certiﬁcation system based on secure non ... · fpgas can be categorized into four...

Documents