lca14: lca14-306: cpuidle & cpufreq integration with scheduler
DESCRIPTION
Resource: LCA14 Name: LCA14-306: CPUidle & CPUfreq integration with scheduler Date: 05-03-2014 Speaker: Daniel Lezcano, Mike Turquette Video: https://www.youtube.com/watch?v=Ug4uQEYwl5sTRANSCRIPT
![Page 1: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/1.jpg)
Wed 5 March, 11:15am, Daniel Lezcano, Mike Turquette
LCA14-306: CPUidle & CPUfreq integration with scheduler
![Page 2: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/2.jpg)
Introduction
● Power aware discussion
● Patchset « Small task packing »− Some informations shared between cpuidle and the
scheduler− https://lwn.net/Articles/520857/
● « Line on the sand » by Ingo Molnar− Integrate first cpuidle and cpufreq with the scheduler− http://lwn.net/Articles/552885/
![Page 3: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/3.jpg)
Scheduler CPUidle
Idle task
Governor CPUidle backenddriver
cpuidle_idle_callswitch_to
cpuidle_select cpuidle_enter
CPUidle + scheduler : Current design
![Page 4: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/4.jpg)
Idle time measurement
● From the scheduler :− The duration of the idle task is running− Includes the interrupt processing time
● From CPUidle :− The duration between interrupts
● CPUIdle code happens with local interrupts disabled
● T(idle task) = Σ T(CPUidle) + Σ T(irqs)
![Page 5: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/5.jpg)
Idle time measurement
![Page 6: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/6.jpg)
Idle time measurement unification
● What is the impact of returning to the scheduler each time an interrupt occurred ?− Scheduler will choose the idle task again if nothing
to do− Mainloop code simplified− Idle time measured nearly the same for the
scheduler and cpuidle− Probably a negative impact on performance to fix
![Page 7: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/7.jpg)
Load balance
● Taking the decision to balance a task when going to idle
■ Use of avg_idle● Does not use how long the cpu will sleep
■ The idle state should be selected before■ CPUIdle should give the state the cpu will be
● Balance a task to the idlest cpu■ Does not use the cpu's exit latency■ CPUidle should give back the state the cpu is
![Page 8: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/8.jpg)
CPUidle main function
● Reduce the distance between the scheduler and the cpuidle framework− Move the idle task to kernel/sched− Move the cpuidle_idle function in the idle task code− Integrate the idle mainloop and cpuidle_idle_call
● Allows to access the scheduler's private structure definition
![Page 9: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/9.jpg)
Menu governor split
● The events could be classified in three categories :1. Predictable → timers2. Repetitive → IOs3. Random → key stroke, incoming packet
● Category 2 could be integrated into the scheduler
![Page 10: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/10.jpg)
IO latency tracking
● IO are repetitive within a reasonable interval to assume it as predictable enough
![Page 11: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/11.jpg)
IO latency tracking
● Measurement from the scheduler− io_schedule− io_schedule_timeout
● Count per task the io latency− Task migration moves IO history unlike current
governor− Latency constraint for the task
![Page 12: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/12.jpg)
Combine informations
● Move predictable event framework in the scheduler
● Informations combined between the scheduler and menu governor will be more accurate− Idle balance decision based on the idle state a cpu
is or about to enter− Load tracking from task for idle state exit latency− CPU computation power and topology− DVFS strategies for exit idle state boost
![Page 13: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/13.jpg)
Scheduler + CPUidle
● The scheduler should have all the informations to tell CPUidle :− How long it will sleep− What is the latency constraint
● The CPUidle should use the information provided by the scheduler :− Select an idle state− Use the backend driver idle callback− No more heuristics
![Page 14: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/14.jpg)
Status
● A lot of cleanups around the idle mainloop
● CPUidle main function inside the idle mainloop− Code distance reduced, sharing the structures
scheduler/cpuidle− Communication between sub-systems made easier
![Page 15: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/15.jpg)
Work in progress
● First iteration of IO latency tracking implemented− Validation in progress
● Simple governor for CPUIdle− Select a state
● Idle time unification experimentation
![Page 16: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/16.jpg)
CPUfreq + scheduler
The title is misleading … CPUfreq may completely disappear in the future.
![Page 17: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/17.jpg)
CPUfreq + scheduler
The title is misleading … CPUfreq may completely disappear in the future.
Goal is to initiate CPU dynamic voltage & frequency scaling (DVFS) from the Linux scheduler
![Page 18: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/18.jpg)
CPUfreq + scheduler
The title is misleading … CPUfreq may completely disappear in the future.
Goal is to initiate CPU dynamic voltage & frequency scaling (DVFS) from the Linux scheduler
Nobody knows what this will look like, so please ask questions and raise suggestions
![Page 19: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/19.jpg)
• Polling workqueue• E.g. ondemand
• Based on idle time / busyness
• No relation to decisions taken by the scheduler
• Task may be run at any time
• No relation to idle task• In fact, task will not wake-up during idle
CPUfreq today
![Page 20: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/20.jpg)
• Replace polling loop with event driven action
• Scheduler already takes action which affects available compute capacity• Load balance• Migrating tasks to and from CPUs of different compute capacity
• DVFS transitions are a natural fit
Event driven behavior
![Page 21: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/21.jpg)
• Method to initiate CPU DVFS transitions from the scheduler
• Identify call sites to initiate those transitions• Enqueue/dequeue task• Load balance• Idle entry/exit• Aggressively schedule deadline tasks• Maybe others
• Define interface between the scheduler & the DVFS thingy• Currently a power driver in Morten’s RFC• Remove CPUfreq governor layer from the power driver completely?
Lots of work ahead
![Page 22: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/22.jpg)
• Experiment with policy• When and where to evaluate if frequency should be changed• What metrics are important to the algorithm?• DVFS versus race-to-idle
• Integrate with power model
• Benchmark performance & power• Performance regressions• Does it save power?
• Make it work with non-CPUfreq things like PSCI and ACPI for changing CPU P-state
Lots of work ahead, part 2
![Page 23: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/23.jpg)
• https://lkml.org/lkml/2013/10/11/547
• Replaces polling loop in CPUfreq governor with scheduler event-driven action
• CPUfreq machine drivers are re-used initially
• CPUfreq governor becomes a shim layer to the power driver
Morten’s power aware scheduling RFC
![Page 24: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/24.jpg)
• DVFS task is itself scheduled on a workqueue• Might not be run for some time after the scheduler determines that a
DVFS transition should happen
• Kworker threads are filtered out• Prevents infinite reentrancy into the scheduler• CPU capacity is not changed when enqueuing and dequeuing these
tasks
Nitty gritty details
![Page 25: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/25.jpg)
include/linux/sched/power.h
struct power_driver { /* * Power driver calls may happen from scheduler context with irq * disabled and rq locks held. This must be taken into account in * the power driver. */ /* cpu already at max capacity? */ int (*at_max_capacity) (int cpu); /* Increase cpu capacity hint */ int (*go_faster) (int cpu, int hint); /* Decrease cpu capacity hint */ int (*go_slower) (int cpu, int hint); /* Best cpu to wake up */ int (*best_wake_cpu) (void); /* Scheduler call-back without rq lock held and with irq enabled */ void (*late_callback) (int cpu);};
![Page 26: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/26.jpg)
• https://github.com/mturquette/linux/commits/sched-cpufreq
• Replaced workqueue method with per-CPU kthread• This allows removal of the kworker filter• Please commence bikeshedding over the name of this kthread
• Use SCHED_FIFO policy for the task• Will be run before the normal work (right?)
• These patches were just validated yesterday• Bugs• Holes in logic• Misunderstandings• Voided warranties
Incremental changes on top
![Page 27: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/27.jpg)
• Gather more opinions on the power driver interface
• Is go_faster/go_slower the right way?• Spoiler alert: Probably not.
• When else might we want to evaluate CPU frequency?• Idle entry/exit as mentioned by Daniel• Cluster-level considerations
• Sched domains• Not just per-core• Four Cortex-A9’s with single CPU clock
• Coordinate with the power model work
What’s next?
![Page 28: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/28.jpg)
Questions?
![Page 29: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler](https://reader034.vdocuments.mx/reader034/viewer/2022042602/557b2b73d8b42a796a8b564e/html5/thumbnails/29.jpg)
More about Linaro Connect: http://connect.linaro.orgMore about Linaro: http://www.linaro.org/about/
More about Linaro engineering: http://www.linaro.org/engineering/Linaro members: www.linaro.org/members