scheduling in xen: past, present and future...seattle, wa { 18th of august, 2015 scheduling in xen:...
TRANSCRIPT
Scheduling in Xen: Past, Present and Future
Dario [email protected]
Seattle, WA – 18th of August, 2015
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Introduction
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 2 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Welcome
I Hello, my name is Dario
I Working on Xen (from within Citrix) since 2011
I Mostly hypervisor stuff, but also toolstack: NUMA, cpupools,scheduling
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 3 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Scheduling & Scheduling in Xen
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 4 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Scheduling in General
Is scheduling really important, e.g., in OSes?
Well...
I most of the time, there isn’t even any CPU overbooking
I when there’s overbooking, not everything is equally important
I I/O is more important!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 5 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Scheduling in General
Is scheduling really important, e.g., in OSes? Well...
I most of the time, there isn’t even any CPU overbooking
I when there’s overbooking, not everything is equally important
I I/O is more important!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 5 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Scheduling in General
Is scheduling really important, e.g., in OSes? Well...
I most of the time, there isn’t even any CPU overbooking
I when there’s overbooking, not everything is equally important
I I/O is more important!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 5 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Scheduling in Virtualization
Is scheduling really important on a virtualized host?
I There is pretty much always CPU overbooking
I All activities (i.e., all VMs) are (potentially) equallyimportant
Scheduler hacker: �Hey, this sounds pretty cool...�Well, it’s not! I/O is still more important!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 6 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Scheduling in Virtualization
Is scheduling really important on a virtualized host?
I There is pretty much always CPU overbooking
I All activities (i.e., all VMs) are (potentially) equallyimportant
Scheduler hacker: �Hey, this sounds pretty cool...�
Well, it’s not! I/O is still more important!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 6 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Scheduling in Virtualization
Is scheduling really important on a virtualized host?
I There is pretty much always CPU overbooking
I All activities (i.e., all VMs) are (potentially) equallyimportant
Scheduler hacker: �Hey, this sounds pretty cool...�Well, it’s not! I/O is still more important!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 6 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Wait a Second...
And we, on Xen, are even more special:
I How does Xen hadles I/O:
Xen Hypervisor
Hardware
device model(qemu)
toolstack
Control DomainNetBSD or Linux
HardwareDrivers
I/O Devices CPU Memory
Paravirtualized(PV)
Domain:NetBSD or Linux
netbackblkback
netfrontblkfront
Driver Domain
netbackblkback
Xen Hypervisor
Hardware
device model(qemu)
toolstack
Control DomainNetBSD or Linux
HardwareDrivers
I/O Devices CPU Memory
Paravirtualized(PV)
Domain:NetBSD or Linux
Fully Virtualized
(HVM)Domain:Windows,FreeBSD...
netbackblkback
netfrontblkfront
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 7 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Wait a Second...
And we, on Xen, are even more special:
I How does Xen hadles I/O:
Xen Hypervisor
Hardware
device model(qemu)
toolstack
Control DomainNetBSD or Linux
HardwareDrivers
I/O Devices CPU Memory
Paravirtualized(PV)
Domain:NetBSD or Linux
netbackblkback
netfrontblkfront
Driver Domain
netbackblkback
Xen Hypervisor
Hardware
device model(qemu)
toolstack
Control DomainNetBSD or Linux
HardwareDrivers
I/O Devices CPU Memory
Paravirtualized(PV)
Domain:NetBSD or Linux
Fully Virtualized
(HVM)Domain:Windows,FreeBSD...
netbackblkback
netfrontblkfront
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 7 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Wait a Second... (cont. II)
So, I/O, turns out you actually need scheduling, eh?
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 8 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
History of Xen Scheduling(Past)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 9 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
I Mean... Honestly...
WHO CARES ?!?!No, seriously:
I some info at http://wiki.xen.org/wiki/Xen Project Schedulers
I we just killed SEDF in 4.6 (yay!!)check RTDS for something similar (but better!)
I that’s it
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 10 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
I Mean... Honestly...
WHO CARES ?!?!
No, seriously:
I some info at http://wiki.xen.org/wiki/Xen Project Schedulers
I we just killed SEDF in 4.6 (yay!!)check RTDS for something similar (but better!)
I that’s it
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 10 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
I Mean... Honestly...
WHO CARES ?!?!No, seriously:
I some info at http://wiki.xen.org/wiki/Xen Project Schedulers
I we just killed SEDF in 4.6 (yay!!)check RTDS for something similar (but better!)
I that’s it
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 10 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
I Mean... Honestly...
WHO CARES ?!?!No, seriously:
I some info at http://wiki.xen.org/wiki/Xen Project Schedulers
I we just killed SEDF in 4.6 (yay!!)check RTDS for something similar (but better!)
I that’s it
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 10 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
I Mean... Honestly...
WHO CARES ?!?!No, seriously:
I some info at http://wiki.xen.org/wiki/Xen Project Schedulers
I we just killed SEDF in 4.6 (yay!!)check RTDS for something similar (but better!)
I that’s it
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 10 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
I Mean... Honestly...
WHO CARES ?!?!No, seriously:
I some info at http://wiki.xen.org/wiki/Xen Project Schedulers
I we just killed SEDF in 4.6 (yay!!)check RTDS for something similar (but better!)
I that’s it
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 10 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Xen’s Scheduler Features(Preset)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 11 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Hard and Soft Affinity
I pinning: you can run there and only there!#xl vcpu-pin vm1 0 2
I hard affinity: you can’t run outside of that spot# xl vcpu-pin vm1 all 8-12
I soft affinity: you can’t run outside of that spot and,preferably, you should ru there# xl vcpu-pin vm1 all - 10,11
Same achieved with cpus= and cpus soft= in config file.
cpus= or cpus soft= in config file control where memory isallocated
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 12 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Hard and Soft Affinity (cont.)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 13 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
What about NUMA
Automatic placement policy in libxl (since Xen 4.2)
I acts at domain creation time, important for memoryallocation!
I easy to tweak (at libxl build time, for now) heuristics:I use the smallest possible set of nodes (ideally, just one)I use the (set of) node(s) with fewer vCPUs bound to it ([will]
consider both hard and soft affinity)I use the (set of) node(s) with the most free RAM (mimics the
“worst fit”algorithm)
Example:4 GB VM, 6 GB free on node0, 8 GB free on node1Bound vCPUs: 12 to node0, 16 to node1 =⇒ use node0Bound vCPUs: 8 to node0, 8 to node1 =⇒ use node1
Coming: node distances, IONUMA, vNUMA
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 14 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
What about NUMA
Automatic placement policy in libxl (since Xen 4.2)
I acts at domain creation time, important for memoryallocation!
I easy to tweak (at libxl build time, for now) heuristics:I use the smallest possible set of nodes (ideally, just one)I use the (set of) node(s) with fewer vCPUs bound to it ([will]
consider both hard and soft affinity)I use the (set of) node(s) with the most free RAM (mimics the
“worst fit”algorithm)
Example:4 GB VM, 6 GB free on node0, 8 GB free on node1
Bound vCPUs: 12 to node0, 16 to node1 =⇒ use node0Bound vCPUs: 8 to node0, 8 to node1 =⇒ use node1
Coming: node distances, IONUMA, vNUMA
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 14 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
What about NUMA
Automatic placement policy in libxl (since Xen 4.2)
I acts at domain creation time, important for memoryallocation!
I easy to tweak (at libxl build time, for now) heuristics:I use the smallest possible set of nodes (ideally, just one)I use the (set of) node(s) with fewer vCPUs bound to it ([will]
consider both hard and soft affinity)I use the (set of) node(s) with the most free RAM (mimics the
“worst fit”algorithm)
Example:4 GB VM, 6 GB free on node0, 8 GB free on node1Bound vCPUs: 12 to node0, 16 to node1 =⇒ use node0
Bound vCPUs: 8 to node0, 8 to node1 =⇒ use node1
Coming: node distances, IONUMA, vNUMA
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 14 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
What about NUMA
Automatic placement policy in libxl (since Xen 4.2)
I acts at domain creation time, important for memoryallocation!
I easy to tweak (at libxl build time, for now) heuristics:I use the smallest possible set of nodes (ideally, just one)I use the (set of) node(s) with fewer vCPUs bound to it ([will]
consider both hard and soft affinity)I use the (set of) node(s) with the most free RAM (mimics the
“worst fit”algorithm)
Example:4 GB VM, 6 GB free on node0, 8 GB free on node1Bound vCPUs: 12 to node0, 16 to node1 =⇒ use node0Bound vCPUs: 8 to node0, 8 to node1 =⇒ use node1
Coming: node distances, IONUMA, vNUMA
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 14 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
What about NUMA
Automatic placement policy in libxl (since Xen 4.2)
I acts at domain creation time, important for memoryallocation!
I easy to tweak (at libxl build time, for now) heuristics:I use the smallest possible set of nodes (ideally, just one)I use the (set of) node(s) with fewer vCPUs bound to it ([will]
consider both hard and soft affinity)I use the (set of) node(s) with the most free RAM (mimics the
“worst fit”algorithm)
Example:4 GB VM, 6 GB free on node0, 8 GB free on node1Bound vCPUs: 12 to node0, 16 to node1 =⇒ use node0Bound vCPUs: 8 to node0, 8 to node1 =⇒ use node1
Coming: node distances, IONUMA, vNUMA
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 14 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
What about NUMA (cont.)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 15 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
For Dom0
I dom0 max vcpu: makes sense
I dom0 vcpus pin: bleah!!I dom0 nodes: new parameter. Place Dom0’s vCPUs and
memory on one or more nodesI strict (default) uses hard affinityI relaxed uses soft affinity
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 16 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
For Dom0 (cont.)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 17 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Workin’ On(Future)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 18 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
(Better) Exploiting Intel PSR
Intel Platform Shared Resource Monitoring (PSR):
I Cache Monitoring Technology (CMT)
I Cache Allocation Technology (CAT)
CMT: gather information useful for mid-level load balancingdecisions (e.g., at vCPU wakeup, or across sockets/NUMA nodes):
I how ’cache hungry’ is a given pCPU?
I how much free chace is there on a given socket/NUMA node?
CAT: when setting affinity, set CAT accordingly (and vice versa?):
I if you’re going to run on pCPUs #x, #y and #z...
I ...you better have some cache space reserved there!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 19 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
(Better) Exploiting Intel PSR
Intel Platform Shared Resource Monitoring (PSR):
I Cache Monitoring Technology (CMT)
I Cache Allocation Technology (CAT)
CMT: gather information useful for mid-level load balancingdecisions (e.g., at vCPU wakeup, or across sockets/NUMA nodes):
I how ’cache hungry’ is a given pCPU?
I how much free chace is there on a given socket/NUMA node?
CAT: when setting affinity, set CAT accordingly (and vice versa?):
I if you’re going to run on pCPUs #x, #y and #z...
I ...you better have some cache space reserved there!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 19 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
(Better) Exploiting Intel PSR
Intel Platform Shared Resource Monitoring (PSR):
I Cache Monitoring Technology (CMT)
I Cache Allocation Technology (CAT)
CMT: gather information useful for mid-level load balancingdecisions (e.g., at vCPU wakeup, or across sockets/NUMA nodes):
I how ’cache hungry’ is a given pCPU?
I how much free chace is there on a given socket/NUMA node?
CAT: when setting affinity, set CAT accordingly (and vice versa?):
I if you’re going to run on pCPUs #x, #y and #z...
I ...you better have some cache space reserved there!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 19 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Interactions With Guests’ (Linux’s) Scheduler
Scheduler paravirtualization?
SHHSH!! There’s Linux people in the other room, what it theyhear!?!?! :-OAn example: scheduling domains:
I scalability measure (for the Linux scheduler)I balance the load: often among SMT threads, less often
between NUMA nodes!I based on CPU topology
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 20 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Interactions With Guests’ (Linux’s) Scheduler
Scheduler paravirtualization?SHHSH!! There’s Linux people in the other room, what it theyhear!?!?! :-O
An example: scheduling domains:I scalability measure (for the Linux scheduler)I balance the load: often among SMT threads, less often
between NUMA nodes!I based on CPU topology
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 20 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Interactions With Guests’ (Linux’s) Scheduler
Scheduler paravirtualization?SHHSH!! There’s Linux people in the other room, what it theyhear!?!?! :-OAn example: scheduling domains:
I scalability measure (for the Linux scheduler)I balance the load: often among SMT threads, less often
between NUMA nodes!I based on CPU topology
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 20 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Interactions With Guests’ (Linux’s) Scheduler (cont.)
In Xen:
I vCPUs wander among pCPUs
I vCPU #0 and #1 of a guest can be SMT siblings or not... atdifferent times!!
I it doesn’t make sense, in the guest, to try to optimizescheduling basing on something that varies
How to avoid this?
I flat layout for the scheduling domains → just one domain withall the pCPUs
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 21 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Interactions With Guests’ (Linux’s) Scheduler (cont.)
In Xen:
I vCPUs wander among pCPUs
I vCPU #0 and #1 of a guest can be SMT siblings or not... atdifferent times!!
I it doesn’t make sense, in the guest, to try to optimizescheduling basing on something that varies
How to avoid this?
I flat layout for the scheduling domains → just one domain withall the pCPUs
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 21 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Interactions With Guests’ (Linux’s) Scheduler (cont.)
In Xen:
I vCPUs wander among pCPUs
I vCPU #0 and #1 of a guest can be SMT siblings or not... atdifferent times!!
I it doesn’t make sense, in the guest, to try to optimizescheduling basing on something that varies
How to avoid this?
I flat layout for the scheduling domains → just one domain withall the pCPUs
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 21 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2
Credit2 scheduler, authored by George, is still in experimentalstatus.
Take it out from there!!
Why? In general, more advanced, a lot of potential:
I historical load based load balancing
I runqueue kept in order of credit (instead than Round-Robin asin Credit1)
I configurable runqueue arrangement
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 22 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2
Credit2 scheduler, authored by George, is still in experimentalstatus.
Take it out from there!!
Why? In general, more advanced, a lot of potential:
I historical load based load balancing
I runqueue kept in order of credit (instead than Round-Robin asin Credit1)
I configurable runqueue arrangement
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 22 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2
Credit2 scheduler, authored by George, is still in experimentalstatus.
Take it out from there!!
Why? In general, more advanced, a lot of potential:
I historical load based load balancing
I runqueue kept in order of credit (instead than Round-Robin asin Credit1)
I configurable runqueue arrangement
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 22 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Why?
Complexity:
I in Credit we have:I credits and weightsI 2 prioritiesI oh, actually, it’s 3I active and non active state of vCPUsI flipping between active/non-active means flipping between
burning/non-burning credits, which in turns means wanderingaround among priorities
I in Credit2 we have:I credits burned basing on weightsI that’s itI no, really, it’s that simple! :-)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 23 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Why?
Complexity:I in Credit we have:
I credits and weightsI 2 prioritiesI oh, actually, it’s 3I active and non active state of vCPUsI flipping between active/non-active means flipping between
burning/non-burning credits, which in turns means wanderingaround among priorities
I in Credit2 we have:I credits burned basing on weightsI that’s itI no, really, it’s that simple! :-)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 23 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Why?
Complexity:I in Credit we have:
I credits and weightsI 2 prioritiesI oh, actually, it’s 3I active and non active state of vCPUsI flipping between active/non-active means flipping between
burning/non-burning credits, which in turns means wanderingaround among priorities
I in Credit2 we have:I credits burned basing on weights
I that’s itI no, really, it’s that simple! :-)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 23 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Why?
Complexity:I in Credit we have:
I credits and weightsI 2 prioritiesI oh, actually, it’s 3I active and non active state of vCPUsI flipping between active/non-active means flipping between
burning/non-burning credits, which in turns means wanderingaround among priorities
I in Credit2 we have:I credits burned basing on weightsI that’s it
I no, really, it’s that simple! :-)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 23 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Why?
Complexity:I in Credit we have:
I credits and weightsI 2 prioritiesI oh, actually, it’s 3I active and non active state of vCPUsI flipping between active/non-active means flipping between
burning/non-burning credits, which in turns means wanderingaround among priorities
I in Credit2 we have:I credits burned basing on weightsI that’s itI no, really, it’s that simple! :-)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 23 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Why? (cont.)
Scalability:
I in CreditI periodic runqueue sorting. Freezes a runqueueI periodic accounting. Freezes the whole scheduler!
I in Credit2 we have:I “global”lock only for load balancing
(looking at improving it)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 24 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Why? (cont.)
Scalability:I in Credit
I periodic runqueue sorting. Freezes a runqueueI periodic accounting. Freezes the whole scheduler!
I in Credit2 we have:I “global”lock only for load balancing
(looking at improving it)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 24 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Why? (cont.)
Scalability:I in Credit
I periodic runqueue sorting. Freezes a runqueueI periodic accounting. Freezes the whole scheduler!
I in Credit2 we have:I “global”lock only for load balancing
(looking at improving it)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 24 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: What’s Missing?
I SMT awareness (done, missing final touches)
I hard and soft affinity support (someone working on it)
I tweaks and optimization in the load balancer (someonelooking at it)
I cap and reservation (!!!)
Plan: mark it as !experimantal for 4.6New Plan: mark it as !experimental for 4.7 :-P
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 25 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: What’s Missing?
I SMT awareness (done, missing final touches)
I hard and soft affinity support (someone working on it)
I tweaks and optimization in the load balancer (someonelooking at it)
I cap and reservation (!!!)
Plan: mark it as !experimantal for 4.6
New Plan: mark it as !experimental for 4.7 :-P
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 25 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: What’s Missing?
I SMT awareness (done, missing final touches)
I hard and soft affinity support (someone working on it)
I tweaks and optimization in the load balancer (someonelooking at it)
I cap and reservation (!!!)
Plan: mark it as !experimantal for 4.6New Plan: mark it as !experimental for 4.7 :-P
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 25 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Performance Evaluation
Time for building Xen:
I 16 pCPUs host
I 8 vCPUs guestI varying # of build jobs → # of vCPUs busy in the guest
I -j4, -j8, -j (unlimited)
I varying the interferece load → # of vCPUs active in Dom0I nothing, 4 CPU hog tasks, 12 CPU hog tasks
Anticipation:
I Some tweaks still missing, but really promising!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 26 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Performance Evaluation
Time for building Xen:
I 16 pCPUs host
I 8 vCPUs guest
I varying # of build jobs → # of vCPUs busy in the guestI -j4, -j8, -j (unlimited)
I varying the interferece load → # of vCPUs active in Dom0I nothing, 4 CPU hog tasks, 12 CPU hog tasks
Anticipation:
I Some tweaks still missing, but really promising!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 26 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Performance Evaluation
Time for building Xen:
I 16 pCPUs host
I 8 vCPUs guestI varying # of build jobs → # of vCPUs busy in the guest
I -j4, -j8, -j (unlimited)
I varying the interferece load → # of vCPUs active in Dom0I nothing, 4 CPU hog tasks, 12 CPU hog tasks
Anticipation:
I Some tweaks still missing, but really promising!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 26 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Performance Evaluation
Time for building Xen:
I 16 pCPUs host
I 8 vCPUs guestI varying # of build jobs → # of vCPUs busy in the guest
I -j4, -j8, -j (unlimited)
I varying the interferece load → # of vCPUs active in Dom0I nothing, 4 CPU hog tasks, 12 CPU hog tasks
Anticipation:
I Some tweaks still missing, but really promising!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 26 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Performance Evaluation
Time for building Xen:
I 16 pCPUs host
I 8 vCPUs guestI varying # of build jobs → # of vCPUs busy in the guest
I -j4, -j8, -j (unlimited)
I varying the interferece load → # of vCPUs active in Dom0I nothing, 4 CPU hog tasks, 12 CPU hog tasks
Anticipation:
I Some tweaks still missing, but really promising!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 26 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Performance Evaluation
Time for building Xen:
I 16 pCPUs host
I 8 vCPUs guestI varying # of build jobs → # of vCPUs busy in the guest
I -j4, -j8, -j (unlimited)
I varying the interferece load → # of vCPUs active in Dom0I nothing, 4 CPU hog tasks, 12 CPU hog tasks
Anticipation:
I Some tweaks still missing, but really promising!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 26 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Performance Evaluation (cont.)
Credit 2 with HyperThreading support FTW (a.k.a: we’re donehere, let’s go have beers!! ;-P)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 27 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Performance Evaluation (cont.)
Credit 2 with HyperThreading support FTW (a.k.a: we’re donehere, let’s go have beers!! ;-P)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 27 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Performance Evaluation (cont. II)
Almost good, apart from ”-j4”. It should be consistend with ”-j4”of ”no interf” (a.k.a.: ok, maybe just one beer? )
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 28 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Performance Evaluation (cont. II)
Almost good, apart from ”-j4”. It should be consistend with ”-j4”of ”no interf” (a.k.a.: ok, maybe just one beer? )
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 28 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Performance Evaluation (cont. III)
Really good against Credit1, but ”-j4” still has issues (a.k.a.: ok, Igot it. Let’s grab some coffee and get back to work!)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 29 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Credit2: Performance Evaluation (cont. III)
Really good against Credit1, but ”-j4” still has issues (a.k.a.: ok, Igot it. Let’s grab some coffee and get back to work!)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 29 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Conclusions
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 30 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Conclusions
Having our scheduler and being able to tweak it to our nees is veryimportant:
I we should continously try assess where we stand (performancewise, scalability wise, etc.)
I we should always strive to do better!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 31 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Conclusions
Having our scheduler and being able to tweak it to our nees is veryimportant:
I we should continously try assess where we stand (performancewise, scalability wise, etc.)
I we should always strive to do better!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 31 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Conclusions
Having our scheduler and being able to tweak it to our nees is veryimportant:
I we should continously try assess where we stand (performancewise, scalability wise, etc.)
I we should always strive to do better!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 31 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Q&A
Thanks again,
Questions?
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 32 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Spare Slides
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 33 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
(Better) Exploiting Intel CMT
Biggest limitations:
I num. of activitieas that can be monitored is limited
I applies to L3 cache only, for now
What would be desirable:
I per-vCPU granularity =⇒ No! Too few monitoring IDs
I L2 occupancy/bandwidth stats, for helping intra-socketscheduling decisions =⇒ No! Only L3
What I’m thinking to:I use one monitoring ID per pCPU. This gives:
I how ’cache hungry’ a pCPU is beingI how much free chace there is on each socket/NUMA node
I sample periodically and use for mid-level load balancingdecisions
I ... ideas welcome!!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 34 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
(Better) Exploiting Intel CMT
Biggest limitations:
I num. of activitieas that can be monitored is limited
I applies to L3 cache only, for now
What would be desirable:
I per-vCPU granularity =⇒ No! Too few monitoring IDs
I L2 occupancy/bandwidth stats, for helping intra-socketscheduling decisions =⇒ No! Only L3
What I’m thinking to:I use one monitoring ID per pCPU. This gives:
I how ’cache hungry’ a pCPU is beingI how much free chace there is on each socket/NUMA node
I sample periodically and use for mid-level load balancingdecisions
I ... ideas welcome!!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 34 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
(Better) Exploiting Intel CMT
Biggest limitations:
I num. of activitieas that can be monitored is limited
I applies to L3 cache only, for now
What would be desirable:
I per-vCPU granularity =⇒ No! Too few monitoring IDs
I L2 occupancy/bandwidth stats, for helping intra-socketscheduling decisions =⇒ No! Only L3
What I’m thinking to:I use one monitoring ID per pCPU. This gives:
I how ’cache hungry’ a pCPU is beingI how much free chace there is on each socket/NUMA node
I sample periodically and use for mid-level load balancingdecisions
I ... ideas welcome!!
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 34 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
(Better) Exploiting Intel CAT
Basic idea:
I if you’re going to run on pCPUs #x, #y and #z...
I ...you better have some cache space reserved there!
Implementation:
I when setting affinity, set CAT accordingly
Open ”issues”:
I always?
I for hard affinity, or soft affinity, or both?
I with what mask (i.e., how to split the cache)?
I what about the vice-versa?
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 35 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
(Better) Exploiting Intel CAT
Basic idea:
I if you’re going to run on pCPUs #x, #y and #z...
I ...you better have some cache space reserved there!
Implementation:
I when setting affinity, set CAT accordingly
Open ”issues”:
I always?
I for hard affinity, or soft affinity, or both?
I with what mask (i.e., how to split the cache)?
I what about the vice-versa?
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 35 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
(Better) Exploiting Intel CAT
Basic idea:
I if you’re going to run on pCPUs #x, #y and #z...
I ...you better have some cache space reserved there!
Implementation:
I when setting affinity, set CAT accordingly
Open ”issues”:
I always?
I for hard affinity, or soft affinity, or both?
I with what mask (i.e., how to split the cache)?
I what about the vice-versa?
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 35 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
(Better) Exploiting Intel CAT
Basic idea:
I if you’re going to run on pCPUs #x, #y and #z...
I ...you better have some cache space reserved there!
Implementation:
I when setting affinity, set CAT accordingly
Open ”issues”:
I always?
I for hard affinity, or soft affinity, or both?
I with what mask (i.e., how to split the cache)?
I what about the vice-versa?
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 35 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Driver Domain Aware Scheduling
Suppose:
I vCPU x is top priority (higher credits, whatever)
I vCPU x issues an I/O operation. It has some remainingtimeslice (or credit, or whatever) available, but it blockswaiting for results
I some other domains’ vCPUs y , w and z have higher prioritythan I/O’s vCPUs (Dom0 or driver domain)
Schedule: vx , vy , vw , vz , vdrv dom −→ only now vx can resume
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 36 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Driver Domain Aware Scheduling
Suppose:
I vCPU x is top priority (higher credits, whatever)
I vCPU x issues an I/O operation. It has some remainingtimeslice (or credit, or whatever) available, but it blockswaiting for results
I some other domains’ vCPUs y , w and z have higher prioritythan I/O’s vCPUs (Dom0 or driver domain)
Schedule: vx , vy , vw , vz , vdrv dom −→ only now vx can resume
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 36 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Driver Domain Aware Scheduling (cont.)
What if, vx could donate its timeslice to the entity that is blockingit?
Schedule: vx , vdrv dom, vx , vw , vz −→ vx unblocks right away(this, assuming servicing I/O to be quick, and does not evenexhaust vx timeslice)
I avoids priority inversion (no, we’re not the Mars Pathfinder,but still...)
I makes vx sort of “pay”, from the CPU load it generates withits I/O requests (fairness++)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 37 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Driver Domain Aware Scheduling (cont.)
What if, vx could donate its timeslice to the entity that is blockingit?
Schedule: vx , vdrv dom, vx , vw , vz −→ vx unblocks right away(this, assuming servicing I/O to be quick, and does not evenexhaust vx timeslice)
I avoids priority inversion (no, we’re not the Mars Pathfinder,but still...)
I makes vx sort of “pay”, from the CPU load it generates withits I/O requests (fairness++)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 37 / 39
Introduction Sched. in Virt. History of (Xen) Scheduling Scheduler Features Workin’ On Conclusions
Driver Domain Aware Scheduling (cont.)
What if, vx could donate its timeslice to the entity that is blockingit?
Schedule: vx , vdrv dom, vx , vw , vz −→ vx unblocks right away(this, assuming servicing I/O to be quick, and does not evenexhaust vx timeslice)
I avoids priority inversion (no, we’re not the Mars Pathfinder,but still...)
I makes vx sort of “pay”, from the CPU load it generates withits I/O requests (fairness++)
Seattle, WA – 18th of August, 2015 Scheduling in Xen: Past, Present and Future 37 / 39