trusted cloud storage tech talk
TRANSCRIPT
Victor Costan (龍望), Hsin-Jung Yang (楊昕蓉), Srini Devadas, Nickolai Zeldovich
Secure Cloud Storage and Computing Using
Reconfigurable Hardware
Why Security Matters
Cloud Computing: Dreams and Reality
• The Cloud: Ideal Picture • The Cloud: Reality
Cloud Storage: Attack Vectors
Hypervisor Bugs
State Manipulation
Hardware Attacks
Replay Attacks are Harmful
Spot the Differences
Spot the Differences
Spot the Differences
Spot the Differences: Job
Spot the Differences: Job
Spot the Differences: Name, Relationship Status
Why It Matters
• We rely on fresh data to make decisions
– Google searches
– Facebook profiles
– Twitter, Linked-In
• Outdated data has big impact on users
– Wrong profile information: confusion, embarrassment
– Old search results: bad business decisions, embarrassment
– Old document versions: costly business decisions, regulatory issues
System Design
Design: Cloud Storage API • Block Device
– Fixed block size (1Mb)
– Write(block number, block)
– Read(block number) block
• Easy to reason about the security
• File systems operate on top of this abstraction
B1 B2 B3 B4
Disk divided into 1MB blocks
Design: System Architecture
CPU (Untrusted)
Disk (Untrusted)
RAM (Untrusted) Network Card
(Untrusted)
FPGA / ASIC (Trusted)
Secure NVRAM Chip
System Bus (Untrusted)
Client
Internet (Untrusted)
Design: Trusted Storage on Untrusted Disks 160-bit hash in trusted memory authenticates 1TB disk
B1 B2 B3 B4
h1=h(B1)
h5=h(h1||h2)
h2=h(B2) h3=h(B3) h4=h(B4)
h6=h(h3||h4)
h7=h(h5||h6)
Disk divided into 1MB blocks
Root Hash
Leaves hash their blocks
Nodes hash their children
Root hash matches iff all blocks match
20 levels
Design: Hash Tree Caching
Node
number
Hash Verified Left
child
Right
child
1 fabe3c05d8ba995af93e Y Y N
2 e6fc9bc13d624ace2394 Y Y Y
4 53a81fc2dcc53e4da819 Y N N
5 b2ce548dfa2f91d83ec6 Y N N
1
2 3
4 5 6 7
The FPGA caches hash tree nodes
The untrusted OS is free to choose the caching policy, for maximum
performance
Design: Hash Tree Cache • Server stores entire hash tree in RAM
• FPGA has a cache that stores a subset of nodes
• Server tells FPGA what nodes to store
Node Hash Verified
1 fabe… Y
2 e6fc… Y
4 53a8… Y
5 b2ce… Y
1
2 3
4 5 6 7
Cache management commands
Design: Hash Tree Cache - Load
Node Hash Verified
1 fabe… Y
2 e6fc… Y
4 53a8… N
1
2
4
Node Hash Verified
1 fabe… Y
2 e6fc… Y
4 53a8… N
5 b2ce… N
1
2
4 5
• Server tells the FPGA to load a node into a cache entry
• The cache entry is unverified right after a load
Design: Hash Tree Cache - Verify
Node Hash Verified
1 fabe… Y
2 e6fc… Y
4 53a8… N
5 b2ce… N
Node Hash Verified
1 fabe… Y
2 e6fc… Y
4 53a8… Y
5 b2ce… Y
1
2
4 5
• Server tells the FPGA to use a node to verify its children
• FPGA checks that parent’s hash matches children hashes
1
2
4 5
Design: Hash Tree Cache - Efficiency
• Checking leaf 33 requires 10 node loads for a cold cache on this toy example (38 loads on the real FPGA tree)
• Remember the root is always loaded in the cache
1
2 3
4 5
16 17
8 9
32 33
Design: Hash Tree Cache - Efficiency
• Checking leaf 38 only 4 node loads, because 9 is already in the cache and verified
• Server can predict client requests and manage cache for high performance
1
2 3
4 5
16 17
8 9
32 33
19 18
38 39
Results
Results: System Architecture
CPU (Untrusted)
Disk (Untrusted)
RAM (Untrusted) Network Card
(Untrusted)
FPGA / ASIC (Trusted)
Secure NVRAM Chip
System Bus (Untrusted)
Client
Internet (Untrusted)
Results: Server Prototype
Results: Server Prototype
Results: Normal Operation
Results: FPGA Board, Normal Operation
Results: Attack Does Not Impact User
Results: FPGA Board, Under Attack
Results: Performance Block Diagram
HMAC (Sign) Result
Limit: Hash Engine Speed
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Results: Performance Block Diagram
HMAC (Sign) Result
Limit: Hash Engine Speed
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Results: Prototype Performance (est.)
1 MB = 1 block
Disk I/O Throughput
7,200 RPM HDD 70 MB/s
10,000 RPM HDD 100 MB/s
15,000 RPM HDD 130 MB/s
SSD 250 MB/s
HMAC (Sign) Result
Limit: Hash Engine Speed
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Results: Performance Block Diagram
HMAC (Sign) Result
Limit: Hash Engine Speed
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Results: Prototype Performance (est.)
Operation Throughput
Block Hash 800 MB/s
Pipelined
Block Hash
3,200 MB/s
Transport Throughput
PCI Express x16 4,096 MB/s
SATA II 384 MB/s
PCI Express x1 250 MB/s
Ethernet 125 MB/s
1 MB = 1 block
HMAC (Sign) Result
Limit: Hash Engine Speed
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Results: Performance Block Diagram
HMAC (Sign) Result
Limit: Hash Engine Speed
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Results: Prototype Performance (est.)
Operation Throughput
Tree Node Hash 1.25 M/s
Pipelined
Tree Node Hash
5.0 M/s
Tree Operations 62.5 k/s
Optimized Tree
Operations
2.5 M/s
Transport Throughput
PCI Express x16 4,096 MB/s
SATA II 384 MB/s
PCI Express x1 250 MB/s
Ethernet 125 MB/s
1 MB = 1 block
HMAC (Sign) Result
Limit: Hash Engine Speed
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Results: Performance Block Diagram
HMAC (Sign) Result
Limit: Hash Engine Speed
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Results: Prototype Performance (est.)
Operation Throughput
Tree Node Hash 1.25 M/s
Pipelined
Tree Node Hash
5.0 M/s
Tree Operations 62.5 k/s
Transport Throughput
PCI Express x16 4,096 MB/s
SATA II 384 MB/s
PCI Express x1 250 MB/s
Ethernet 125 MB/s
1 MB = 1 block
HMAC (Sign) Result
Limit: Hash Engine Speed
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Results: Performance Block Diagram
HMAC (Sign) Result
Limit: Hash Engine Speed
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Results: Prototype Performance (est.)
Operation Throughput
Node HMAC 1.25 M/s
Transport Throughput
PCI Express x16 4,096 MB/s
SATA II 384 MB/s
PCI Express x1 250 MB/s
Ethernet 125 MB/s
1 MB = 1 block
HMAC (Sign) Result
Limit: Hash Engine Speed
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
Results: Performance Block Diagram
HMAC (Sign) Result
Limit: Hash Engine Speed
Update Hash Tree (Writes Only)
Limit: Hash Engine Speed Limit: Dependencies
Load & Verify Hash Tree Nodes
Limit: Hash Engine Speed Limit: Dependencies
Hash 1MB Data Block
Limit: Hash Engine Speed Limit: FPGA Data Bus
Read / Write 1MB Data Block to Disk
Limit: Disk I/O Speed
• Steps are performed in parallel (pipelined), because they are in different system components
• However, the slowest step is the bottleneck for the entire system
• Each step can be made faster by adding more hardware (e.g. more disks), assuming cache policies can scale up
Results: Ping-Pong Workload
0
1
2
3
4
5
6
7
8
9
10
0 5 10 15 20
Blo
ck
Time
• Typical collaboration scenario
• Real-Life
– Google Docs
– Facebook Messages
– Dropbox
• Straight-up LRU shines here
Results: Photo Gallery Workload
0
1
2
3
4
5
6
7
8
9
10
0 5 10 15 20
Blo
ck
Time
• Modeled after data on photo applications
• Real-Life
– Facebook’s #1 Feature
– Google Picasa
– Flixter
• Special policy inspired by Facebook Haystack classifies photos, loads cache predictively
Results: Map-Reduce Workload
0
5
10
15
20
25
30
0 5 10
Blo
ck
Time
• Index-generating Map-Reduce
• Real-Life
– Google Pagerank
– Facebook friend graph (EdgeRank)
• Special policy that takes advantage of Map-Reduce access pattern
Results: Cache Hit Rates
0.5
0.6
0.7
0.8
0.9
1
Spec LRU
Haystack
MR-Aware
• Applications: 2 users collaborating on a file (ping-pong), photo gallery browsing, Map-Reduce job
• Cache policies: Speculative Last-Recently Used, Facebook Haystack’s policy optimized for caching, policy optimized for Map-Reduce access patterns
• Conclusion: no policy works well on all applications, so app server must drive policy
Results: Protocol Overhead
• Client – Server Bandwidth overhead: 0.002%
– Operation: 1 HMAC (20 bytes) per 1MB = 0.002%
– Handshake: extra secret exchange piggybacks on SSL: 5%
• Latency overhead (1 client): 4%
– Without security: 8.2ms / request
– With security: 8.5ms / request
– Latency overhead = the latency of a very fast Internet hop
• No throughput overhead (N-clients)
– With or without security: 100MB/s
– Need 40 HDDs to saturate PCI-E x16, 52 HDDs to saturate FPGA
MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
Results: Protocol Overhead
• Protocol is simple enough to implement on browser side
– Chrome
– Firefox
– Internet Explorer 10
• Easy integration in existing Web applications
• End-to-end security
Questions?
Thank You!
Other Applications
• FPGA can be used to load user-specified circuits and perform arbitrary computation with security guarantees
• Applications: encrypted image search, financial calculations
• Potential applications in highly regulated industries, e.g. medical record keeping and processing, secure financial services
MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
Secure Computation: Overview
Task
Untrusted computation:
VM image
Trusted computation:
Circuit spec
Cloud Machine
VM image CPU cores
Circuit spec FPGA
LUTs
• Most code is untrusted, executes in a VM
• Trusted code is broken up into kernels which become circuits deployed onto an FPGA
• If efficiency is not an issue, deploy a processor on the FPGA, execute software securely
6/9/2011
MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
Secure Computation: Challenge
• Multi-tenancy is the key to the cloud’s cost effectiveness
• FPGA can host different applications running in parallel
• Challenge: isolation between applications, just like a hypervisor
FPGA controller
Client 1 Application
Client 2 Application
Client 3 Application
VM Hypervisor
Client 1 VM
Client 2 VM
Client 3 VM
PCI Express
Other Applications
• FPGA can be used to load user-specified circuits and perform arbitrary computation with security guarantees
• Applications: encrypted image search, financial calculations
• Potential applications in highly regulated industries, e.g. medical record keeping and processing, secure financial services
Design: FPGA Boot Sequence
PKcard + Manufacturer Certificate
random nonce
PUFsyndrome + SignPKcard(PUFsyndrome)
Root Hash + SignPKcard(nonce || Root Hash)
EncSKfpga(SKcard) + MACSKfpga
(nonce || SKcard)
Check certificate against e-fuses
Compute SKfpga from PUFsyndrome
Verify signature
Verify MAC
Check Pkcard against certificate
Design: Client Trust Model • Each FPGA – NVRAM pair has a Endorsement Key (EK)
• Manufacturer certifies the public EK
• Client uses the public EK to encrypt a HMAC key, which becomes its shared secret with the trusted hardware
Manufacturer
PrivEK PubEK
Endorsement Certificate
sign Client
verify
HMAC key
generate
Encrypted HMAC key
encrypt with PubEK
HMAC key
decrypt with PrivEK
Design: Hash Tree Security
1. Impossible to come up with a block B1’ such that B1 ≠ B1’ but h(B1) = h(B1’)
2. Impossible to come up with a node hash h1’ such that h1’ such that h1 ≠ h1’ but h(h1||h2) = h(h1’||h2)
Therefore, the root hash authenticates the entire contents of the tree.
• Server OS transfers messages between FPGA and Trusted Memory untrusted channel
• FPGA authenticates Trusted Memory using Manufacturer Certificate, whose public key is burned into FPGA’s e-fuses
• Trusted Memory authenticates FPGA using its Physically Unclonable Function (PUF)
• At manufacturing time, FPGA is paired with memory chip
• FPGA can be paired with new memory chip if necessary
Design: FPGA Boot Sequence Security
Design: Hash Tree Cache Security
• Server OS responsible for loading and verifying tree nodes
• Parent node hash verifies children nodes
• Reading a block requires the block’s leaf to be verified
• Writing a block requires the path from the block’s leaf to the root to be loaded and verified
• A node can be loaded in at most one cache line, to prevent replay attacks using stale node hashes