spotcheck: designing a derivative iaas cloud on the spot market
TRANSCRIPT
SpotCheck: Designing a Derivative IaaSCloud on the Spot Market
Prateek Sharma, Stephen Lee, Tian Guo,David Irwin, Prashant Shenoy
EuroSys 2015
Motivation System Design Evaluation Related Work Conclusion
Infrastructure Clouds
Cloud computing is popular and offers many benefits:I Ease of deployment, scalability, pay-as-you-goI Infrastructure, Platform, Software as a ServiceI Infrastructure clouds offer computing and storage resources
2/ 20
Motivation System Design Evaluation Related Work Conclusion
Cost vs. Availability TradeoffsCloud servers have different cost vs. availability tradeoffs:On-demand servers:
I Fixed price per unit timeI Non-revocable
Spot servers:I Variable prices based on market conditionsI Revocable =⇒ lower availabilityI Surplus capacity sold at lower price
m3.medium m3.large0.00
0.05
0.10
0.15
0.20
Pri
ce (
$/h
r)
On-demand Spot (average)
3/ 20
Motivation System Design Evaluation Related Work Conclusion
Cost vs. Availability TradeoffsCloud servers have different cost vs. availability tradeoffs:On-demand servers:
I Fixed price per unit timeI Non-revocable
Spot servers:I Variable prices based on market conditionsI Revocable =⇒ lower availabilityI Surplus capacity sold at lower price
m3.medium m3.large0.00
0.05
0.10
0.15
0.20
Pri
ce (
$/h
r)
On-demand Spot (average)
3/ 20
Motivation System Design Evaluation Related Work Conclusion
Exploiting Spot Servers
Spot servers ideal for batch jobsI Batch jobs are disruption tolerantI Checkpoint-Restart to tolerate spot revocations
Spot servers for interactive applications?I Revocable =⇒ Downtime
I Mitigate impact of revocation?I Potentially lower costs
4/ 20
Motivation System Design Evaluation Related Work Conclusion
Exploiting Spot Servers
Spot servers ideal for batch jobsI Batch jobs are disruption tolerantI Checkpoint-Restart to tolerate spot revocations
Spot servers for interactive applications?I Revocable =⇒ Downtime
I Mitigate impact of revocation?I Potentially lower costs
4/ 20
Motivation System Design Evaluation Related Work Conclusion
Running Interactive Applications on Spot Servers
Spot
Application
Application
On-demand
ApplicationMigrate
Naive solution:I Spot server revoked =⇒ migrate to on-demandI Revocation =⇒ risk of losing application stateI Users cannot manage complexity & risk of migration
Problem Statement:Design a system to transparently use a mix of spot & on-demandservers to run interactive applications
5/ 20
Motivation System Design Evaluation Related Work Conclusion
Running Interactive Applications on Spot Servers
Spot
Application
Application
On-demand
ApplicationMigrate
Naive solution:I Spot server revoked =⇒ migrate to on-demand
I Revocation =⇒ risk of losing application stateI Users cannot manage complexity & risk of migration
Problem Statement:Design a system to transparently use a mix of spot & on-demandservers to run interactive applications
5/ 20
Motivation System Design Evaluation Related Work Conclusion
Running Interactive Applications on Spot Servers
Spot
Application
Application
On-demand
ApplicationMigrate
Naive solution:I Spot server revoked =⇒ migrate to on-demand
I Revocation =⇒ risk of losing application stateI Users cannot manage complexity & risk of migration
Problem Statement:Design a system to transparently use a mix of spot & on-demandservers to run interactive applications
5/ 20
Motivation System Design Evaluation Related Work Conclusion
Running Interactive Applications on Spot Servers
Spot
Application
Application
On-demand
ApplicationMigrate
Naive solution:I Spot server revoked =⇒ migrate to on-demandI Revocation =⇒ risk of losing application stateI Users cannot manage complexity & risk of migration
Problem Statement:Design a system to transparently use a mix of spot & on-demandservers to run interactive applications
5/ 20
Motivation System Design Evaluation Related Work Conclusion
Talk Outline
1. Motivation2. System Design3. Evaluation4. Related Work5. Conclusion
6/ 20
Motivation System Design Evaluation Related Work Conclusion
Derivative CloudI Cloud middleware derived from infrastructure cloudsI Buy servers from IaaS and resell to usersI Use servers of different types =⇒ different SLA from IaaSI Example: spot & on-demand pool to lower costs and increase
availability
Derivative Cloud
Native IaaS Cloud
Customers
Spot PoolVM VM VM
On-demand PoolVM VM VM
Lease Servers
Request Servers
Resell Servers
7/ 20
Motivation System Design Evaluation Related Work Conclusion
SpotCheckI SpotCheck: Derivative cloud using spot & on-demand serversI Provide low-cost, non-revocable servers to run unmodified
applicationsI Key idea: Run on spot. When revoked, migrate to on-demandI Requirement: Transparently migrate VMs without losing state
SpotCheck
Native IaaS Cloud
Spot PoolVM VM VM
On-demand PoolVM VM VM
Lease Servers
Migrate
8/ 20
Motivation System Design Evaluation Related Work Conclusion
VM Migration
Key idea:Run on spot. When revoked, migrate to on-demand
Naive approach: Virtual Machine Live MigrationI Migration may not completeI Small termination warning (~2 minutes on EC2)I Incomplete migration =⇒ loss of state
9/ 20
Motivation System Design Evaluation Related Work Conclusion
Bounded-time VM MigrationI Guaranteed bound on VM migration timeI Independent of memory size, application behaviour
Backup Server
Spot
VM
Continuouslycheckpoint memory
On-demand
VM
Lazily restore memory
I Residual dirty pages sent in bounded time
I Lazy restore: quickly restore VM
10/ 20
Motivation System Design Evaluation Related Work Conclusion
Bounded-time VM MigrationI Guaranteed bound on VM migration timeI Independent of memory size, application behaviour
Backup Server
Spot
VM
Continuouslycheckpoint memory
On-demand
VM
Lazily restore memory
I Residual dirty pages sent in bounded time
I Lazy restore: quickly restore VM
10/ 20
Motivation System Design Evaluation Related Work Conclusion
Bounded-time VM MigrationI Guaranteed bound on VM migration timeI Independent of memory size, application behaviour
Backup Server
Spot
VM
Continuouslycheckpoint memory
On-demand
VM
Lazily restore memory
I Residual dirty pages sent in bounded time
I Lazy restore: quickly restore VM10/ 20
Motivation System Design Evaluation Related Work Conclusion
Nested VirtualizationProblem:
I Bounded-time VM migration needs hypervisor modificationI IaaS clouds don’t allow hypervisor modifications
Solution: Nested VirtualizationI User VMs run on a nested hypervisor (XenBlanket)I Bounded-time migration, VM lazy restore implemented inside
nested hypervisor
IaaS HypervisorCloud server
11/ 20
Motivation System Design Evaluation Related Work Conclusion
Nested VirtualizationProblem:
I Bounded-time VM migration needs hypervisor modificationI IaaS clouds don’t allow hypervisor modifications
Solution: Nested VirtualizationI User VMs run on a nested hypervisor (XenBlanket)I Bounded-time migration, VM lazy restore implemented inside
nested hypervisor
IaaS HypervisorCloud server
Nested HypervisorUser VM
11/ 20
Motivation System Design Evaluation Related Work Conclusion
Spreading Revocation Risk
Spot pool
Spot pool
VM 1
VM 2
VM 3
VM 1
VM 2
VM 3
On-demand
VM 1
VM 2
VM 3
Problem: Revocation storms
I Spot pool revoked =⇒ migrate all VMs to on-demand
Solution: Multiple independent poolsI Revocations from one pool dont affect other pools
12/ 20
Motivation System Design Evaluation Related Work Conclusion
Spreading Revocation Risk
Spot poolSpot pool
VM 1
VM 2
VM 3
VM 1
VM 2
VM 3
On-demand
VM 1
VM 2
VM 3
Problem: Revocation stormsI Spot pool revoked =⇒ migrate all VMs to on-demand
Solution: Multiple independent poolsI Revocations from one pool dont affect other pools
12/ 20
Motivation System Design Evaluation Related Work Conclusion
Spreading Revocation Risk
Spot pool
Spot pool
VM 1
VM 2
VM 3
VM 1
VM 2
VM 3
On-demand
VM 1
VM 2
VM 3
Problem: Revocation stormsI Spot pool revoked =⇒ migrate all VMs to on-demand
Solution: Multiple independent poolsI Revocations from one pool dont affect other pools
12/ 20
Motivation System Design Evaluation Related Work Conclusion
Spreading Revocation Risk
Spot pool
Spot pool
VM 1
VM 2
VM 3
VM 1
VM 2
VM 3
On-demand
VM 1
VM 2
VM 3
Problem: Revocation stormsI Spot pool revoked =⇒ migrate all VMs to on-demand
Solution: Multiple independent poolsI Revocations from one pool dont affect other pools
12/ 20
Motivation System Design Evaluation Related Work Conclusion
Pool Management Policies
I Distribute VMs among multiple poolsI Example: availability zones, server sizes (small, large)I Spot prices in different pools uncorrelatedI Pool distribution policies :
1. Equally2. Weighted by pool cost3. Weighted by pool availability
I Centralized controller dynamically selects pool for new VMs.
13/ 20
Motivation System Design Evaluation Related Work Conclusion
Evaluation
SpotCheck prototype implemented on Amazon EC2
I Application PerformanceI Cost of SpotCheck VMsI Availability
I Benchmarks: SpecJBB, TPC-WI Experiments on EC2 m3.medium instancesI Spot prices from April to Oct 2014
14/ 20
Motivation System Design Evaluation Related Work Conclusion
Impact on Application PerformanceSpec-JBB 0.0015%TPC-W 16.7%
Performance degradation due to continuous checkpointing
1 10 20 30 40 50Num. VMs per backup server
02000400060008000
1000012000
Thro
ughp
ut (b
ops)
SpecJBB Throughput
I Backup server can support up to 40 VMsI Each VM shares 1
40th the cost of a backup server
15/ 20
Motivation System Design Evaluation Related Work Conclusion
Impact on Application PerformanceSpec-JBB 0.0015%TPC-W 16.7%
Performance degradation due to continuous checkpointing
1 10 20 30 40 50Num. VMs per backup server
02000400060008000
1000012000
Thro
ughp
ut (b
ops)
SpecJBB Throughput
I Backup server can support up to 40 VMsI Each VM shares 1
40th the cost of a backup server
15/ 20
Motivation System Design Evaluation Related Work Conclusion
Cost of Running VMs on SpotCheck
1-Pool 2-Pools 4-Pools Equal Distributed
4-Pools Cost
4-Pools Stability
0.00
0.01
0.02
0.03
0.04
0.05
Avera
ge c
ost
per
hour
($) Live migration Bounded time migration
$0.014$0.0086
I Cost with different pool management policies is ~$0.014/hrI Cost saving of 80% compared to on-demand
16/ 20
Motivation System Design Evaluation Related Work Conclusion
Availability
1-Pool 2-Pools 4-Pools Equal Distributed
4-Pools Cost
4-Pools Stability
0.00
0.05
0.10
0.15
0.20
Unavaila
bili
ty (
%)
SpotCheck with Full restore SpotCheck with Lazy restore
0.002%0.019%
0.16%
0.02%
I Availability of 99.998% relative to on-demand serversI Downtime (~20 seconds) during migrations
17/ 20
Motivation System Design Evaluation Related Work Conclusion
Related Work
Spot instances: [Javadi et al., 2011, Ben-Yehuda et al., 2011]Derivative Clouds: PiCloud, HerokuNested Virtualization: XenBlanket [Williams et al., 2012],
Turtles [Ben-Yehuda et al., 2010]Bounded time Migration: Yank [Singh et al., 2013],
Remus [Cully et al., 2008]Lazy VM restoration: Post-Copy migration [Hines et al., 2009],
SnowFlock [Lagar-Cavilla et al., 2009]
18/ 20
Motivation System Design Evaluation Related Work Conclusion
Conclusion
I Different cost vs. availability tradeoffs in Infrastructure cloudsI Spot servers: Cheap but volatileI Derivative Cloud : abstraction layer between user and cloudI SpotCheck : Derivative cloud with spot & on-demand serversI Prototyped on Amazon EC2I Cost savings of 80%, 5 9’s availability
19/ 20
Motivation System Design Evaluation Related Work Conclusion
END
Thank You
20/ 20
Spot price spikes
0 50 100 150 200Time
0.0
0.1
0.2
0.3
0.4
0.5
Pri
ce (
$/h
r)
Spot price On-demand
1/ 12
Spot Prices
0.0 0.2 0.4 0.6 0.8 1.0Spot−price
Ondemand−priceratio
0.0
0.2
0.4
0.6
0.8
1.0Av
aila
bilit
y CD
F
m3.mediumm3.largem3.xlargem3.2xlarge
2/ 12
Time line
0 10 20 30 40 50
Time
0.00
0.02
0.04
0.06
0.08
0.10
Pri
ce (
$/h
r)
Spot price
bid price
3/ 12
Price Correlation
1 2 3 4 5 6 7 8 91
01
11
21
31
41
51
61
71
81
9
Zone IDs
123456789
10111213141516171819
Zon
e I
Ds
1.0
0.5
0.0
0.5
1.0
4/ 12
What if everybody starts using this?
I Public cloud server market is largeI More incentive to expand spot marketsI Bidding is truth revealingI Benefits both users and cloud providers
5/ 12
Cost of running VMs on SpotCheck
Factors which affect cost:
p :Probability of revocationE(S) :Expected Spot price based on price tracesD :On-demand priceB :Backup server cost = 1 large serverN :# VMs sharing a backup server = 40
Expected Cost of running a VM on SpotCheck:
E(Cost) = (1− p) · E (S) + p · D +BN
Expected Cost = 0.2× On-Demand = $ 0.014 / hour
6/ 12
Availability
Max. num. of concurrent revocationsN/4 N/2 3N/4 N
1-Pool 0 0 0 1.74 × 10−4
2-Pool 0 3.75 × 10−3 0 2.25 × 10−5
4-Pool 7.4 × 10−3 7.71 × 10−5 1.92 × 10−5 0
Probability of the maximum number of concurrent revocations fordifferent pools. N is the number of VMs.
7/ 12
Nested Virtualization overhead
I Nested Virtualization overhead is dependent on applicationcharacteristics
I Can use containers.I Migration mechanisms in containers need to be implemented
8/ 12
Storage & Networking
I Direct access to EBS and S3I Bridged or NAT networking inside nested hypervisorI VPC to isolate user VMs
9/ 12
Restore overhead
0 1 5 10Num. VMs being concurrently lazily restored
010203040506070
Resp
onse
tim
e (m
s)
TPC-W response time
10/ 12
Backup Slides
1. Spot pricing, CDF, avail.2. What if everyone starts using this. Market large. More
incentive for IaaS .3. Storage and networking. EBS/S3. Bridge or NAT nw
interfaces.4. Inter cloud operation. exploring practical challenges with
inter-cloud networking5. Nested overhead. Indeed significant for certain workloads. 1)
Implementations will mature. 2) Only use this because IaaSdoesnt expose migration functionality 3) Looking at usingOS-level virtualization and containers.
11/ 12
References
(2014).Heroku.http://www.heroku.com.
(2014).PiCloud.http://www.multyvac.com.
Ben-Yehuda, M., Day, M., Dubitzky, Z., Factor, M., Har’El, N.,Gordon, A., Liguori, A., Wasserman, O., and Yassour, B. (2010).The Turtles Project: Design and Implementation of NestedVirtualization.In OSDI.
Ben-Yehuda, O., Ben-Yehuda, M., Schuster, A., and Tsafrir, D.(2011).Deconstructing Amazon EC2 Spot Instance Pricing.In CloudCom.
Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N.,and Warfield, A. (2008).Remus: High Availability via Asynchronous Virtual MachineReplication.In NSDI.
Hines, M. R., Deshpande, U., and Gopalan, K. (2009).Post-copy Live Migration of Virtual Machines.SIGOPS Operating Systems Review, 43(3).
Javadi, B., Thulasiram, R., and Buyya, R. (2011).Statistical Modeling of Spot Instance Prices in Public CloudEnvironments.In UCC.
Lagar-Cavilla, H. A., Whitney, J. A., Scannell, A. M., Patchin, P.,Rumble, S. M., De Lara, E., Brudno, M., and Satyanarayanan, M.(2009).SnowFlock: Rapid Virtual Machine Cloning for Cloud Computing.In EuroSys.
Singh, R., Irwin, D., Shenoy, P., and Ramakrishnan, K. (2013).Yank: Enabling Green Data Centers to Pull the Plug.In NSDI.
Williams, D., Jamjoom, H., and Weatherspoon, H. (2012).The Xen-Blanket: Virtualize Once, Run Everywhere.In EuroSys.
12/ 12