scaling guest os critical sections with cs · scaling guest os critical sections with ecs sanidhya...
TRANSCRIPT
![Page 1: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/1.jpg)
Scaling Guest OS Critical Sections with eCS
Sanidhya Kashyap, Changwoo Min, Taesoo Kim
![Page 2: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/2.jpg)
The physical and virtual CPU abstraction
● Mismatch between
CPU abstraction
2
![Page 3: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/3.jpg)
The physical and virtual CPU abstraction
3
Physical machine (Host)pCPU 1 pCPU 2 pCPU 3 pCPU 4
● Mismatch between
CPU abstraction
![Page 4: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/4.jpg)
The physical and virtual CPU abstraction
4
Hardware abstraction Physical machine (Host)
pCPU 1 pCPU 2 pCPU 3 pCPU 4
● Mismatch between
CPU abstraction
![Page 5: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/5.jpg)
The physical and virtual CPU abstraction
5
Physical machine (Host)pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
Virtual machinevCPU 1 vCPU 2 vCPU 3 vCPU 4
App App App...Hardware abstraction
● Mismatch between
CPU abstraction
![Page 6: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/6.jpg)
The physical and virtual CPU abstraction
6
Hardware abstraction
Software abstraction
Physical machine (Host)pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
Virtual machinevCPU 1 vCPU 2 vCPU 3 vCPU 4
App App App...● Mismatch between
CPU abstraction
![Page 7: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/7.jpg)
The physical and virtual CPU abstraction
● Mismatch between
CPU abstraction
● VM consolidation- Contention on pCPU
7
Hardware abstraction
Software abstraction
Physical machine (Host)pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
Virtual machinevCPU 1 vCPU 2 vCPU 3 vCPU 4
App App App...
Multiple vCPUs
Physical machine (Host)pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
VM2
Apps
VM3
Apps
VM4
Apps
VM1
Apps
![Page 8: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/8.jpg)
The physical and virtual CPU abstraction
● Mismatch between
CPU abstraction
● VM consolidation- Contention on pCPU
8
Hardware abstraction
Software abstraction
Physical machine (Host)pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
Virtual machinevCPU 1 vCPU 2 vCPU 3 vCPU 4
App App App...
Multiple vCPUs
Physical machine (Host)pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
VM2
Apps
VM3
Apps
VM4
Apps
VM1
Apps
A vCPU can be preempted without notification
![Page 9: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/9.jpg)
The physical and virtual CPU abstraction
● Mismatch between
CPU abstraction
● VM consolidation- Contention on vCPU
9
Hardware abstraction
Software abstraction
Physical machine (Host)pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
Virtual machinevCPU 1 vCPU 2 vCPU 3 vCPU 4
App App App...
Multiple vCPUs
Physical machine (Host)pCPU 1
Hypervisor
pCPU 2 pCPU 3 pCPU 4
VM2
Apps
VM3
Apps
VM4
Apps
VM1
Apps
A vCPU can be preempted without notification
Double scheduling issue
![Page 10: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/10.jpg)
vCPU 1 vCPU 3vCPU 2vCPU 1
Double scheduling: Lock holder preemption (LHP)
● vCPU holding a lock is preempted
● Preemption hinders forward progress of the VM
● Can lead to application slowdown by 20 -- 130%
10
vCPUscheduled
vCPUpreempted
A B C
File
Access a file Running task in a VM
![Page 11: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/11.jpg)
Efforts to mitigate preemption issues
11
● Focussed only non-blocking locks
○ Acquire iff sufficient schedule time
● Hotplug vCPUs on the fly
○ May not scale to large vCPU VMs
● VM co-scheduling
○ Does not always alleviate the issue
● Mostly address other preemption
problem
○ Blocking locks
○ Unfair non-blocking locks
● Hardware features to mitigate
preemptions
Research efforts Current practice
![Page 12: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/12.jpg)
Efforts to mitigate preemption issues
12
● Focussed only non-blocking locks
○ Acquire iff sufficient schedule time
● Hotplug vCPUs on the fly
○ May not scale to large vCPU VMs
● VM co-scheduling
○ Does not always alleviate the issue
● Mostly address other preemption
problem
○ Blocking locks
○ Unfair non-blocking locks
● Hardware features to mitigate
preemptions
Research efforts Current practice
Prior approaches are mostly specialized
![Page 13: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/13.jpg)
Still the double scheduling is looming!● LHP for blocking locks
○ mutex, rwsem
● Readers preemption (RP) in read-write locks
○ A reader is preempted while holding the lock
● Interrupt context preemption (ICP)
○ Preemption of a vCPU processing an interrupt
13
● Blocked-waiter wakeup (BWW)
○ Waking up a blocked thread on an idle vCPU is at least 10 times costlier
![Page 14: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/14.jpg)
Still the double scheduling is looming!● LHP for blocking locks
○ mutex, rwsem
● Readers preemption (RP) in read-write locks
○ A reader is preempted while holding the lock
● Interrupt context preemption (ICP)
○ Preemption of a vCPU processing an interrupt
14
● Blocked-waiter wakeup (BWW)
○ Waking up a blocked thread on an idle vCPU is at least 10 times costlier
Semantic gap between virtual and physical CPU
![Page 15: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/15.jpg)
Our approach to address semantic gap
15
Insight:A vCPU may be running a critical task!
Approach:Avoid preempting a vCPU with a critical task
Design:Identify and mark/unmark a critical task
![Page 16: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/16.jpg)
vCPU 1vCPU 1 vCPU 2vCPU 2 vCPU 3
Identifying each critical section with eCS
16
ScheduledvCPU
PreemptedvCPU
A B C
File
Access a file
● Synchronization primitives protect critical sections → ensure OS progress
● Mark and unmark critical sections before and after the critical section
● Conservative, but effective approach to address each preemption problem
○ 60 LoC annotates 85K lock invocations in 13M LoC in Linux
Running task in a VM
![Page 17: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/17.jpg)
vCPU 1vCPU 1 vCPU 2vCPU 2 vCPU 3
Identifying each critical section with eCS
17
ScheduledvCPU
PreemptedvCPU
A B C
File
Access a file EnlightenedvCPU
● Synchronization primitives protect critical sections → ensure OS progress
● Mark and unmark critical sections before and after the critical section
● Conservative, but effective approach to address each preemption problem
○ 60 LoC annotates 85K lock invocations in 13M LoC in Linux
Running task in a VM
![Page 18: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/18.jpg)
vCPU 1 vCPU 2vCPU 2 vCPU 3
Identifying each critical section with eCS
18
ScheduledvCPU
PreemptedvCPU
A B C
File
Access a file EnlightenedvCPU
● Synchronization primitives protect critical sections → ensure OS progress
● Mark and unmark critical sections before and after the critical section
● Conservative, but effective approach to address each preemption problem
○ 60 LoC annotates 85K lock invocations in 13M LoC in Linux
Running task in a VM
![Page 19: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/19.jpg)
Sharing the state for efficient notification
19
vCPU(A) vCPU(B) vCPU(C)
eCSstates
eCSstates
eCSstates
VM
...
pcpu_overloaded (0/1)vcpu_preempted (0/1)
non_preemptable_ecs_countpreemptable_ecs_count
vCPU(A)state
eCSstates
eCSstates
eCSstates
vCPU(B)state
vCPU(C)state
Hypervisor
...
● Each vCPU shares memory with the hypervisor
● vCPU updates information for critical sections
○ Notifies critical task to the hypervisor
● Hypervisor also updates scheduler context
before/after scheduling out a vCPU
○ Enables vCPU to make efficient scheduling
decisions
![Page 20: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/20.jpg)
Lightweight para-virtualized APIs to update states
20
vCPU(A) vCPU(B) vCPU(C)
eCSstates
eCSstates
eCSstates
VM
...
pcpu_overloaded (0/1)vcpu_preempted (0/1)
Hint API
VM → Hypervisor
activate_non_preemptable_ecs(cpu)
deactivate_non_preemptable_ecs(cpu_id)
activate_preemptable_ecs(cpu_id))
deactivate_preemptable_ecs(cpu_id)
Hypervisor → VMis_vcpu_preempted(cpu_id)
is_pcpu_overloaded(cpu_id)
non_preemptable_ecs_countpreemptable_ecs_count
vCPU(A)state
eCSstates
eCSstates
eCSstates
vCPU(B)state
vCPU(C)state
Hypervisor
...
Updated by each vCPU; read by the hypervisor
Update by the hypervisor; read by a vCPU
![Page 21: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/21.jpg)
vCPU 1 vCPU 3vCPU 2vCPU 1
Hypervisor checks eCS state before scheduling out a vCPU
21
A B C
File
Access a file
vCPU(A) vCPU(B) vCPU(C)
eCSstates
eCSstates
eCSstates
ecs_count (0)
VM1
...
ecs_count (1)ecs_count (0)
...
Time sharedpCPU 1
vCPU 1VM2
vCPU 1VM1
ScheduledvCPU
PreemptedvCPU
EnlightenedvCPU
Running task in a VM
➀ Running vCPU 1➁ vCPU 1 acquires lock➂ vCPU 1 updates eCS count➃ Hypervisor checks states before vCPU 1 preemption➄ Hypervisor lets vCPU 1 runs for extra time➅ vCPU 1 finishes and updates eCS count➆ Hypervisor penalizes vCPU 1 later
VM1
➀
➁
➂➃
➄
➅
➆
![Page 22: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/22.jpg)
vCPU 1 vCPU 3vCPU 2vCPU 1
Hypervisor checks eCS state before scheduling out a vCPU
22
A B C
File
Access a file
vCPU(A) vCPU(B) vCPU(C)
eCSstates
eCSstates
eCSstates
ecs_count (0)
VM1
...
ecs_count (1)ecs_count (0)
...
Time sharedpCPU 1
vCPU 1VM2
vCPU 1VM1
ScheduledvCPU
PreemptedvCPU
EnlightenedvCPU
Running task in a VM
➀ Running vCPU 1➁ vCPU 1 acquires lock➂ vCPU 1 updates eCS count➃ Hypervisor checks states before vCPU 1 preemption➄ Hypervisor lets vCPU 1 runs for extra time➅ vCPU 1 finishes and updates eCS count➆ Hypervisor penalizes vCPU 1 later
VM1
➀
➁
➂➃
➄
➅
➆
Extended schedule Penalized schedule
![Page 23: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/23.jpg)
The case for system eventual fairness● Hypervisor accounts extra time and later penalizes the enlightened VM
○ Penalize the schedule of an enlightened VM
○ Extend the schedule of the very next VM
● Hypervisor optimistically extends time for an enlightened CS
○ Decision made just before scheduling out a vCPU
○ Extra time (schedule) to avoid preemption: 1 ms
23
![Page 24: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/24.jpg)
Even vCPU can make efficient scheduling decisions
● Share the hypervisor context with each VM○ Lock waiters can avoid bWW problem
● Virtualized scheduling-aware spinning○ Lock waiter keeps spinning until the lock is not
acquired if the pCPU is not overloaded
24
vCPU(A) vCPU(B) vCPU(C)
eCSstates
eCSstates
eCSstates
vCPU(A)state
eCSstates
eCSstates
eCSstates
vCPU(B)state
vCPU(C)state
Hypervisor
VM
...
...
pcpu_overloaded (0/1)
![Page 25: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/25.jpg)
Implementation
● Rely on paravirtualized VM
● Extended scheduler’s preempt_notifier API to check eCS states
○ Rely on scheduler_tick() to avoid vCPU preemption
● Overall implementation is 1000 LoC
○ 60 LoC for annotating almost every lock-based critical section
25
![Page 26: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/26.jpg)
Evaluation
● Does eCS improves VM’s performance?
● Does hypervisor maintain system eventual fairness?
● Setup: 8-socket, 80-core NUMA machine
26
![Page 27: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/27.jpg)
Impact of eCS in over-committed scenario
27
Apache web server Psearchy
● Experiment: run two VMs running same application
● eCS improves application throughput by 1.2 -- 2.3X
● eCS avoids preemptions by 85.8--100% → an extra schedule tick is sufficient
Preemptions avoided
![Page 28: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/28.jpg)
Impact of eCS in under-committed scenario
28
● Experiment: Run only one VM with an application
● eCS improves application performance by 1.2 -- 1.9X
● Virtualized scheduling-aware spinning addresses BWW for blocking locksApache web server Psearchy
![Page 29: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/29.jpg)
System eventual fairness
29
● Experiment: an application reading a file
● Hypervisor’s scheduler (CFS) maintains eventual fairness
● Both VMs get equal time even though VM2 (eCS) is granted extra schedules
● CFS maintains eventual fairness by penalizing VM2
○ Each run for equal time (4.95 seconds out of 10 seconds)
![Page 30: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/30.jpg)
Discussion
● Right approach for Linux adoption
○ Leverage steal_time_struct that exposes preempted method
● Annotation
○ Use VM → Hypervisor API to mark functions
● Extending the concept to the userspace
○ Require composable scheduling abstraction to support user space
30
![Page 31: Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya Kashyap, Changwoo Min, Taesoo Kim](https://reader035.vdocuments.mx/reader035/viewer/2022081614/5fc779f86da10f4f56115493/html5/thumbnails/31.jpg)
Conclusion
● Double scheduling leads to several preemption problems
● Six lightweight paravirtualized methods to annotate critical sections
● Leverage hypervisor’s scheduler to mitigate vCPU preemptions
● Allow vCPU to make efficient scheduling decision
● A generic approach to mitigate all preemption problems!
31
Thank you!