establishingsoftware root of trust unconditionally€¦ · ok => malware-free devic e specs....
TRANSCRIPT
Establishing Software Root of Trust Unconditionally
(or, a First Rest Stop on the Never-Ending Road to Provable Security)
Virgil D. Gligor Maverick S.-L. Woo
CyLabCarnegie Mellon University
Pittsburgh, PA 15213
NDSS 2019San Diego, CA
February 27, 20192/27/19 1
Outline
2/27/19 2
I. What is it? - Definition & relationships
- Unconditional solution
II. Why is it hard?- 3 Problems
- RoT ≠ software-based, crypto attestation
III. How to do it?- randomized polynomials
- k-independent (almost) universal hash families; and- space-time optimal in cWRAM; and- scalable optimal bounds
IV. Q & A
Full Paper is the CMU-CyLab TR 18-003https://www.cylab.cmu.edu/_files/pdfs/tech_reports/CMUCyLab18003.pdf
2/27/19 3
I. What is it?
Device State: content of processor registers (R) and (persistent) memories (M)
2/27/19 4
Memory2
NIC
Memory4
Disk controller
CPU4
CPU3
CPU2
Memory3
Memory1
CPU1
Memory0
CPU0Sys Mgt
USB controller
GPU
RAM
System State:union of all device states at time T
CPU RM
Bus System
persistent malware
is unknownUntrusted Device State:
Don’tCare
Don’tCare
Controlled by a Powerful Adversary
Don’tCare
PC
Init I/O P D Don’tCareR
Initialize
A Verifier: initializes an untrusted system state to chosen content
Verifier
2/27/19 5
Check
checks that the state has all & only chosen content & PC values
USB controller
M
Memory2
NIC
Memory4
Disk controller
CPU4
CPU3
CPU2
Memory3
Memory1
CPU1
Don’tCare
Sys Mgt
Memory1
CPU1 GPU
InitI/O
PD
CPU R
M
Verifier
Bus System
Root of Trust (RoT) Establishment
Verifiable Boot
Verifiable Boot
Verifiable boot: either boot code in a secure stateor detect unknown content
Verifiable Boot
2/27/19 6
Verifiable boot => Secure State => RoT StateTrusted Recovery => . . .
Access Control Models => . . .. . .
Secure State: RoT state (chosen content) satisfies security predicate P
2/27/19 7
Unconditional Solution
- no Secrets, no Trusted HW Modules, no Bounds on Adversary’s Power
Importance?
- no dependencies on the unknown & unknowable- a defender has a provable advantage over any adversary- outlives technology advances.
- need only - random bits- device specifications.
________________*I know of no other unconditional solution to any software security problem
*
2/27/19 8
II. Why is it hard? I. What is it?
2/27/19 9
Complexity theory?- non-asymptotic bounds? Very few - on Device Specs? None
Trusted Verifier:- doesn’t disclose random bits ✓- knows correct outcomes ✓- measures time securely à Device
- non-asymptotic bounds - on Device Specs; e.g., ISA ++
(a realistic model of computation?)
≤ malware-free DeviceCm,tTrustedVerifier
1. space-time optimal
e.g., Horner’s rule for polynomial evaluationuniquely optimal in infinite fields: 2d (×,+)
not optimal in finite fields, nor on any Device ISA++
m
CPUregisters R
M
Device
Initialize
LocalVerifier
space-time optimal Cm,t
nonce
random bits
Cnonceß
t?Cnonce(M,R)?
OK => malware-free
DeviceSpecs
2/27/19 10
m
CPUregisters R
M
Device
Initializespace-time optimal
Cm,t
tCnonce(M,R)
≤ malware-free DeviceCm,tTrustedVerifier
1. space-time optimal
LocalVerifier
DeviceSpecs
nonce
random bits
Cnonceß
- non-asymptotic bounds - on Device Specs
2/27/19
Dishonest execution:e.g., changes code and/or data
outputs a guessuses a powerful remote proxy
- adversary execution?
- how could it help?
e.g., malware beats m-t bounds => Cnonce(v) becomes unpredictable
M
CPU
Device
registers R
C’nonce’(v’)ßC’m’,t’(v’)
Initialize
nonce
time(v) Cnonce(v)
Input
Unusedmemory
Output
DeviceInitialization
11
v’≠v
≤ malware-free DeviceCm,tTrustedVerifier
1. space-time optimal
LocalVerifier
DeviceSpecs
random bits
- non-asymptotic bounds - on Device Specs
Engineering Solution?
e.g., see - segmented memory
Complexity Theory? - no help.
2/27/19
M
CPU
Device
registers R
C’nonce’(v’)ßC’m’,t’(v’)
- adversary execution
Initialize
nonce
time(v) Cnonce(v)
Input
Unusedmemory
Output
DeviceInitialization
v’≠v
12
≤ malware-free DeviceCm,tTrustedVerifier
1. space-time optimal
LocalVerifier
DeviceSpecs
random bits
- non-asymptotic bounds - on Device Specs
2/27/19 13
Reduction is insufficient !
- adversary execution
CPU
Device
Initialize
registers R
DeviceInitialization
Input
Unusedmemory
Output
Cnonce(v)ßCm,t(v)
nonce
time(v) Cnonce(v)
future-postedInterrupt& reboot
v
≤ malware-free DeviceCm,tTrustedVerifier
1. space-time optimal
LocalVerifier
DeviceSpecs
random bits
- non-asymptotic bounds - on Device Specs
2/27/19 14
CPU
Device
Initialize
registers R
disable interrupts
Solution? control flow integrity after Cnonce ends
Reduction is insufficient !
=> control flow integrity before Cnonce starts!
- adversary execution
DeviceInitialization
Input
Unusedmemory
Output
Cnonce(v)ßCm,t(v)
nonce
time(v) Cnonce(v)
v
≤ malware-free DeviceCm,tTrustedVerifier
1. space-time optimal
LocalVerifier
DeviceSpecs
random bits
- non-asymptotic bounds - on Device Specs
✓
2/27/19 15
2. Verifiable Control Flow ✓ Cnonce-jßCm,t
Mj
Device j
CPU jregisters Rj
Cnonce-ißCm,t
Mi
Device i
CPU iregisters Ri
3. Two Devices, or more?
- adversary execution
≤ malware-free Device Cm,tTrustedVerifier
1. space-time optimal
LocalVerifier
✓
DeviceSpecs
random bits
- non-asymptotic bounds - on Device Specs
noncej
time gap
Device i corruptsverified Device j
restoresCmi,ti
Device jCmj,tj
- sequential verificationfails
noncei
Cmi,ti Device i
2/27/19 16
Device j
Device i
restoresCmi,ti
Cmi,ti
nonceiCmj,tj
Slow Fast
noncej
endsearly
- ordinary concurrencyfails
2/27/19 17
for Device j
Cmj,tj
“verify”
Device j
noncei
Cmi,ti Device i
endsearly
Slow Fast
noncej
- ordinary concurrencyfails
2/27/19 18
2/27/19 19
Cmj,tj
noncej
Cmi,ti
nonceiδstart
δend
Device iDevice j
Cmi,t’itI < t’i
- concurrent verificationw/ scalable bounds
2/27/19 20
Protocol Atomicityconcurrent transaction
order and durationverifiable
control flow
Code Optimality inAdversary Execution
scalable bounds
unpredictableresult
Legend: dependency
codecomposition
Time-measurement security- caches? TLB?- clock jitter?- multi-processor
interference?- remote proxy?
verifiably avoided
2/27/19 21
Protocol Atomicityconcurrent transaction
order and durationverifiable
control flow
Code Optimality inAdversary Execution
scalable bounds
unpredictableresult
codecomposition
Time-measurement security
Software-basedAttestation
has different goals
- caches? TLB?- clock jitter?- multi-processor
interference?- remote proxy?
verifiably avoided
2/27/19 22
Protocol Atomicityconcurrent transaction
order and durationverifiable
control flow
Signatures/MACsbased on secrets in HW
scalable bounds
unpredictableresult
codecomposition
CryptographicAttestationhas different
goals
2/27/19 23
III. How to do it II. Why is it hard? I. What is it?
2/27/19 24
Solution Overview
Randomized Polynomials
- k-independent uniform coefficients, independent of input x
- k-independent (almost) universal hash function familyand
- (m, t)-optimal in the concrete Word Random Access Machine (cWRAM)and
- optimal bounds m and t are scalable; e.g., no mandatory m�t tradeoffs
new
new kind
new
new
2/27/19 25
- Constants: w-bit word, up to 2 operands/instruction instructions execute in unit time
MM
Overview of the cWRAM ISA++
- variable shiftr/l(Ri, Rj),variable rotater/l(Ri, Rj),...- multiplication(1registeroutput)...-mod (aka., division-with-remainder) . . .
- ISA: all (un)signed integer instructions- All Loads, Stores, Register transfers- All Unconditional & Conditional Branches, all branch types
- all predicates with 1 or 2 operands- Halt - All Computation Instructions:
- addition, subtraction, logic, shiftr/l(Ri, α), rotater/l(Ri, α), . . .
- Memory: M words- Processor registers R: GPRs, PC, PSW, Special Processor + Flag & I/O Registers- Addressing: immediate, relative, direct, indirect- Architecture features: caches, virtual memory, TLBs, pipelining, multi-core processors
- Constants: w-bit word, up to 2 operands/instruction instructions execute in unit time
2/27/19 26
si = Σ rj(i+1)j (mod p)j = 0
k-1
{ r0…rk-1,x } Zp
random bits
$
0
i = dΣ + (si vi)�xi (mod p),
nonce
d = |v|-1Hr0…rk-1,x(v) =
randomized polynomial family
m-t optimal bounds on cWRAM: m = k + 22, t = (6k - 4)6d
Cnonce(v) = Hr0…rk-1,x(v) = Hd,k,x(v)
(m’,t’) “<“ (m, t) => Pr [nonce, f,y : f(y) = Hd,k,x(v) | (m’,t’) ] ≤ 3 p
ΕΕ
k-independent almost universal hash function family
Scalable bounds: k => m , t and d => t
2/27/19 27
FoundationTheorem 1
Let w > 3, and p be a prime, 2 < p < 2w-1. Horner’s rule for one-time honest evaluation of Pd (�) in cWRAM
Pd(�) = Σ ai�xi (mod p) = (. . .(ad�x + ad-1)�x+ . . . +a1)�x + a0 (mod p)
is uniquely (m, t)-optimal if the cWRAM execution space & time are simultaneously minimized; i.e., m = d+11, t = 6d.
i =d
0
Answer to A. M. Ostrowski’s 1954 question:
“Is Horner’s rule optimal for polynomial evaluation?”
with non-asymptotic bounds in a realistic model of computation (cWRAM)
2/27/19 28
IV. Q & A
$
CPU
special processorregisters
Input
Initialize GPR (t0)
constantschosen to fill
unused memory
Horner( )
OutputHd,k,x(v)
LocalVerifier
v
Device
Disable: asynch. events,caches/TLB,. . . Set: other processor regs.
Initialize
r0 r1 . . . rk-1x d. . .
zk. . . si
k+8GPR
nonce= r0 …rk-1,x
random bits
Hd,kx(v)t0+(6k-4)6d
OK => malware-freelog p < w
à 2nd Pass w/ ordinary UHF 2/27/19
k+8GPR
DeviceSpecs
2/27/19 30
Cnonce1ßCm,t
segment M1
LocalVerifier
random bits
CPUregisters R
. . . . . .
noncennonce1
time1Cnonce1(v1)
verifier’s random choice of segment i
Cnoncei ßCm,t
segment Mi
CnoncenßCm,t
segment Mn
timeiCnoncei(vi)
timenCnoncen(vn)
noncei
Memory M
<< round trip
to remote adversary
2/27/19 31
CPU iregisters R
CPU 1registers R1
Cnonce1ßCm,t
segment M1
Local Verifier
random bits
CPUjregisters Rj
. . . . . .
noncennonce1
time1Cnonce1(v1)
Cnoncej ßCm,t
segment Mj
CnoncenßCm,t
segment Mn
noncej
timejCnoncej(vj)
timenCnoncen(vn)
Memory M
<< round trip
to remote adversary
2/27/19 32
Implementation Notes (Appendix C of CMU-CyLab TR 18-003)
Optimal Code: , loop control – simple on most real processorsHorner-rule step? (recall: p is largest prime in w bits)
+ (si vi)
m, T M, T M, tdifferent encodings => different results => SINGLE CHOICE!
multiplymod paddmod p
multiplydiv pmultiply p &subtractadddivpmultiplyp&subtract
multiply-add mod 2w
reduction mod p @ end
Þ 8 + 3 (reduction) instructions