a low overhead hardware technique for software integrity and confidentiality
DESCRIPTION
A Low Overhead Hardware Technique for Software Integrity and Confidentiality. Austin Rogers § , Milena Milenković ‡ , Aleksandar Milenković § Dynetics Inc., Huntsville, AL ‡ WebSphere Process Server Performance, IBM The LaCASA Laboratory Electrical and Computer Engineering Department - PowerPoint PPT PresentationTRANSCRIPT
A Low Overhead Hardware Technique forSoftware Integrity and Confidentiality
Austin Rogers§, Milena Milenković‡, Aleksandar Milenković
§ Dynetics Inc., Huntsville, AL ‡ WebSphere Process Server Performance, IBM
The LaCASA Laboratory
Electrical and Computer Engineering Department
The University of Alabama in Huntsville
http://www.ece.uah.edu/~lacasa
2
Outline
Motivation Computer Security – Threat Models Related Work Architectures for Run-Time Verification
of Software Integrity and Confidentiality Experimental Evaluation Conclusion
3
Motivation
Evolution of computer security Economic, technology, and social trends
Proliferation of embedded computing systems Ubiquitous accessibility and connectivity Diversification of architectures Tightening time-to-market constraints
Growing number of computer security exploits Software vulnerabilities Piracy Reverse engineering
4
Outline
Motivation Computer Security – Threat Models Related Work Architectures for Run-Time Verification
of Software Integrity and Confidentiality Experimental Evaluation Conclusion
5
Computer Security
Integrity Prevent execution of unauthorized code and
use of unauthorized data
Confidentiality Prevent unauthorized copying
Availability Ensure system is available to legitimate users Integrity and confidentiality influence availability
6
Software Attacks
Ability to run programs at lower permission levels or access system over the network
Inject malicious code
Overwrite a return address
Examples
Buffer Overflow Exceed buffer size,
overwrite return address Format String
Vulnerabilities in printf-family functions
Integer Error Integer arithmetic errors
leading to undersized buffers
Dangling Pointer Vulnerability when free
called twice Arc-Injection
Cause jump to library function
Arg #n
Buf[0]
...
Buf[n-1]
…
...
Arg #1
Ret. Address
Previous FP
Local var #1
Local var #2
FP
functionarguments
localvariables
Oldpointer
Arg #n
Buf[0]
...
Buf[n-1]
Attack Code
...
Arg #1
Ret. Address
Previous FP
Local var #1
Local var #2
FP
7
Physical Attacks
Direct physical tampering
Attacker has access to physical hardware
Attacker can modify and override bus transactions
Useful for reverse engineering
Examples
Spoofing Substitute with
malicious block
Splicing Substitute with
different valid block
Replay Substitute with
stale block
CPUMain
Memory
Bus
BusRd(IBJ)IBJ
MJ
IBI
IBJ
BusRd(IBJ)
IBJ
BusRd(IBJ)DBJ
DBJ*
8
Side-channel Attacks
Learn secrets by indirect analysis
Ability to run programs with lower permissions or direct physical access
Two phases: collect information about system, then deduce secrets from that information
Examples
Timing Analysis Different operations take
different amounts of time
Differential Power Analysis Processor consumes
different amounts of power for different instructions
Fault Exploitation Compare results produced with and
without a hardware fault
Architectural Exploitation Take advantage of known
architectural features
9
Outline
Motivation Computer Security – Threat Models Related Work Architectures for Run-Time Verification
of Software Integrity and Confidentiality Experimental Evaluation Conclusion
10
Research in Academia
Individual Attack Solutions Secure stack, pointer encryption, etc.
Execute Only Memory (XOM) Seminal work, several extensions
Sign & Verify Embedded signatures in code, verify at runtime
AEGIS Secure Processor Implemented on FPGA Uses physical unclonable functions
11
Industrial Solutions Flag portions of memory as
not usable for instructions Intel Execute Disable Bit AMD No Execute Bit
Augment existing processor designs IBM SecureBlue ARM TrustZone
Maxim DS5250 Secure Microprocessor Co-processor for handling sensitive operations
12
Outline
Motivation Computer Security – Threat Models Related Work Architectures for Run-Time Verification
of Software Integrity and Confidentiality Results Conclusion
13
Architectures for Runtime Verification
Goal: come up with architectural extensions that are Universal Cost-effective Power efficient Performance effective Applicable to legacy software
14
Architectures for Runtime Verification
3-step sign-and-verify mechanism Secure installation
Secret keys and instruction block signatures are generated and stored together with the program binary
Secure Loading Extract secret program keys
Secure execution Signatures are calculated from fetched instructions
and compared to stored signatures
15
Mechanism for Software Integrity and Confidentiality
Original Code Signed CodeSecure Installation Trusted Code
Signature Match
Signature Fetch
Instruction Fetch
Secure Execution
EKey3(I-Block)
Signature
Key1,Key2,Key3Secure ModeECPU.Key(Key1)
ECPU.Key (Key2)
ECPU.Key(Key3)
I-Block
ProgramLoading
I-Block
=?
EncryptKeys
Generate Keys
EncryptI-Block
Program Header
SignI-Block
A
Key1,Key2,Key3Decrypt
Keys
Re-SignI-Block
DecryptI-Block
16
Basic Implementation: Wait ‘till Verify, CBC-MAC
013:027:42 SBASPAESxorIAESxorIAESS KEYKEYKEY
= ?
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38Time (clock cycles)
Memorypipeline
Cryptopipeline
S
cS
Verification Latency
I0:1 I2:3 I4:5 I6:7 S0:1 S2:3
AESKEY1(SP(A(SB0 )))
17
Architectural Enhancements
Reducing performance overhead Parallelizable Message Authentication Code
(PMAC) [Black, Rogaway 2002] Speculative instruction execution --
Run before Verification (RbV) Reducing memory overhead
Protect multiple cache blocks with a signature
18
Parallel MAC: SIOMI0 I1
I2 I3I4 I5
I6 I7
AES
x
AES
SP[A(SB0)]
AES
x
AES
SP[A(SB1)]
S(SB0)
x
I0I1I2
I4I5
I3
31 0A(SB0)
I6I7S0
S1
S2
S3
=?
S0 S1
S2 S3
KEY1 KEY1
KEY2 KEY2
1,0,))(()()( 134:42 iSBASPAESxorIAESSBS iKEYiiKEYi
)()( 10 SBSxorSBSS
A(SB1)
S(SB1)
19
Parallel MAC: SICMC0 C1
C2 C3
C4 C5
C6 C7
AES
x
AES
SP[A(SB0)]
AES
x
AES
SP[A(SB1)]
S(SB0)
x
C0
C1
C2
C4
C5
C3
31 0A(SB0)
C6
C7
eS0
eS1
eS2
eS3
=?
eS0 eS1
eS2 eS3
KEY1 KEY1
KEY2 KEY2
A(SB1)
S(SB1)
1,0,))(()()( 334:434:4 iSBASPAESxorCI iKEYiiii
))((3 SASPAESxoreSS KEY
AES
AES
KEY3 KEY3
AES
SP[A(eS)]KEY3
20
Verification Latency
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30Time (clock cycles)
Memorypipeline
Cryptopipeline
AESKEY1(SP(A(SB0 )))
S
cS
Verification Latency
I0:1 I2:3 I4:5 I6:7 S0:1 S2:3
= ?
AESKEY1(SP(A(SB1 )))AESKEY3(SP(A(SB0 )))
AESKEY3(SP(A(SB1 )))AESKEY3(SP(A(S )))
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30Time (clock cycles)
Memorypipeline
Cryptopipeline
S
cS
Verification Latency
I0:1 I2:3 I4:5 I6:7 S0:1 S2:3
= ?AESKEY1(SP(A(SB0 )))AESKEY1(SP(A(SB1 )))
IntegrityOnly
Integrity andConfidentiality
21
Run Before Verification
IType Destination ValueReadyFlag
VerifiedFlag
0
1
...
n-1
Speculative execution: continue executing once I-block is fetched, in parallel with verification
Do not commit instructions before verification Instruction Verification Buffer for in-order processors Modify reorder buffer in out-of-order processors
Instruction Verification Buffer
22
Reducing Memory Overhead
Protect two I-blocks with one signature Signature produced by XORing signatures
of all sub-blocks Need both blocks to calculate signature,
other block may or may not be in cache
Sub-block 0
Sub-block 1
Sub-block 2
Sub-block 3
Signature
Block A
Block B
Miss on Condition Action
Block A Block B in cache Fetch block A and Signature
Block B not in cache Fetch blocks A and B (stored in IOB) and Signature
Block B Block A in cache Fetch block B and Signature
Block A not in cache Fetch A, B, and Signature
Tag I-blockValidFlag
0
1
...
m-1
Instruction Opportunity Buffer
23
Outline
Motivation Computer Security – Threat Models Related Work Architectures for Run-Time Verification
of Software Integrity and Confidentiality Experimental Evaluation Conclusion
24
Experimental Environment
BenchmarkSource Code
ARMCross Compiler
Secure InstallationEmulator
Executable
Secure Executable
ExtendedSimulator
BenchmarkInputs
Results
ArchitectureParameters
BaselineSimulator
BenchmarkInputs
BaselineResults
ArchitectureParameters
Performance metric: sim_cycle
Energy metric: uarch.pdissipation
Normalized to the baseline architecture
25
Instruction Block Signature Verification Unit
L1I-cache
L1 D-cache
MMU
Datapath
FPUs IF
Control IBSVU
Processor
Data bus
I-cache
… …
… …
… …
… …
sig
CryptoPipeline (57K gates)
XOR=?
CSig
FSig
sig
Program keys
match
IVB(FIFO)
IOB(FIFO)
26
Performance Overhead
1.911.73
1.43
1.08
1.631.50
1.30
1.051.04 1.03 1.02 1.00
0.0
0.5
1.0
1.5
2.0
2.5
1K 2K 4K 8K
Cache Size
No
rmal
ized
Exe
cuti
on
Tim
e
CBC-MAC WtV
PMAC WtV
PMAC RbV
Performance Overead vs I-Cache Miss Rate
0.5
0.75
1
1.25
1.5
1.75
2
2.25
2.5
0 20 40 60 80 100 120
I-Cache Misses per 1,000 Instructions
Nor
mal
ized
Exe
cutio
n T
ime
CBC WtV
PMAC WtV
PMAC RbV
27
IVB Buffer Depthrijndael_enc
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3
1 KB 2 KB 4 KB 8 KB
L1 Cache Sizes
Nor
mal
ized
Exe
cutio
n T
ime
2 Entries
4 Entries
8 Entries
16 Entries
32 Entries
ispell
0.981
1.021.041.061.081.1
1.121.141.161.181.2
1 KB 2 KB 4 KB 8 KB
L1 Cache Sizes
Nor
mal
ized
Exe
cutio
n T
ime
2 Entries
4 Entries
8 Entries
16 Entries
32 Entries
28
Energy Overhead
1.911.73
1.43
1.08
1.631.50
1.30
1.051.04 1.03 1.02 1.00
0.0
0.5
1.0
1.5
2.0
2.5
1K 2K 4K 8K
Cache Size
No
rmal
ized
En
erg
y
CBC-MAC WtV
PMAC WtV
PMAC RbV
29
Outline
Motivation Computer Security – Threat Models Related Work Architectures for Run-Time Verification
of Software Integrity and Confidentiality Results Conclusion
30
Conclusions
Contributions Extension of the sign-and-verify mechanism
to ensure both software integrity and confidentiality Architectural enhancements
for low performance and power overheads Double key parallelizable MAC Instruction Verification Buffer
Reducing memory overhead Protect multiple blocks with a single signature
Future work Ensuring data integrity and confidentiality Resilience to side-channel attacks