multicore resource management for embedded real-time...
TRANSCRIPT
Multicore Resource Management for Embedded Real-Time Systems
Heechul Yun
University of Kansas
1
High-Performance Multicores for Embedded Real-Time Systems
• Why?
– Intelligence more performance
– Space, weight, power (SWaP), cost
2
Time Predictability Challenge
3
Core1 Core2 GPU FPGA
Memory Controller (MC)
Shared Cache
• Hardware resources are shared among the cores
• Tasks can suffer significant interference delays – unpredictable, non-deterministic non-certifiable, unsafe
DRAM
Example: Real-Time Obstacle Detection and Avoidance
• Co-runners were launched on idle CPU cores • 5X slowdown in detection speed (~10fps 2fps)
– can fail to avoid obstacle • e.g., ) 10m/s aircraft (MAV) can move 1m in 100ms
4
Run-alone w/ Co-runners
Research Mission
• Our research goal is to build predictable, efficient, and safe computing infrastructure for the next generation of intelligent embedded real-time systems, a.k.a., Cyber Physical Systems (CPS).
• Approach – Develop software/hardware mechanisms
– Develop analysis framework
5
Research Results
6
Certifiable Multicore Architecture
Core1
PMC
Shared DRAM
High Performance Real-Time Memory Controller
B/W Regulator
RT CPU Application
NonRT CPU Application
RT GPU Application
B/W mgmt. API System Library
Operating System
NonRT GPU Application
Accelerators (GPU, FPGA)
Core2
PMC
Core3
PMC
Core4
PMC
B/W Regulator
B/W Regulator
B/W Regulator
B/W Regulator
QoS mgmt. API
DRAM bank-aware memory allocator
BWLOCK In submission
MemGuard TC’15, RTAS’13, ECRTS’12
PALLOC RTAS’14
Medusa CPSNA’15
Shared cache, MSHR OSPERT’15, RTAS’16
Memory delay analysis ECRTS’15, RTAS’16
UAV simplex RTCSA’16
PALLOC • DRAM bank-aware
kernel memory allocator
• Can void bank conflict
Improved Isolation
DRAM DIMM
CPC
Memory Controller (MC)
Bank 4
Bank 3
Bank 2
Bank 1
Core1 Core2 Core3 Core4
SMP OS
7
[RTAS14] Heechul Yun, Renato Mancuso, Zheng Wu, Rodolfo Pellizzoni. PALLOC: DRAM Bank-Aware Memory Allo
cator for Performance Isolation on Multicore Real-Time Systems, IEEE Intl. Conference on Real-Time and Embedd
ed Technology and Applications Symposium (RTAS). 2014, pp. 155-166.
DRAM Bank Partitioning
L3
DRAM DIMM
Memory Controller (MC)
Bank 4
Bank 3
Bank 2
Bank 1
Core1 Core2 Core3 Core4
• Private banking
– Allocate pages on certain exclusively assigned banks
Better Performance
Isolation
8
Real-Time Performance
• Setup: HRT Core0, X-server Core1 • Buddy: no bank control (use all Bank 0-15) • Diffbank: Core0 Bank0-7, Core1 Bank8-15
9
Buddy(solo) PALLOC(diffbank) Buddy
MemGuard
10
• Goal: guarantee minimum memory b/w for each core • How: b/w reservation + best effort sharing
Operating System
Core1
Core2
Core3
Core4
PMC PMC PMC PMC
DRAM DIMM
MemGuard
Multicore Processor Memory Controller
BW Regulator
BW Regulator
BW Regulator
BW Regulator
0.6GB/s 0.2GB/s 0.2GB/s 0.2GB/s
Reclaim Manager
[TC16] Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Marco Caccamo, and Lui Sha. Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platforms, IEEE Transactions on Computers, Vol 65, Issue 2, 2015, pp. 562 – 576 (Editor's Pick of the year 2016)
Impact of Reservation
11
LLC
mis
ses/
ms
Time (ms) Time (ms)
W/o MemGuard MemGuard (1GB/s)
LLC
mis
ses/
ms
Evaluation Results
C0
Shared Memory
C2
Intel Core2
L2 L2
462.Libquantum (foreground)
0.0
0.2
0.4
0.6
0.8
1.0
1.2
run-alone co-run run-alone co-run run-alone co-run
w/o Memguard Memguard(reservation only)
Memguard(reclaim+share)
No
rmal
ized
IPC
Foreground (462.libquantum)
1GB/s 1GB/s
memory hogs (background)
C2
.2GB/s .2GB/s
Reclaiming and Sharing maximize performance
Guaranteed performance
OS Controlled MSHR Partitioning
13
• Cache partitioning != cache performance isolation – Due to contention in Miss Status Holding Registers (MSHRs)
• Architecture and OS collaborative solution – Small changes (add two registers) in hardware
– Small changes in OS scheduler
Block Addr. Valid Issue Target Info.
Block Addr. Valid Issue Target Info.
Block Addr. Valid Issue Target Info.
MSHR 1
MSHR 2
MSHR n
TargetCount ValidCount
Last Level Cache (LLC)
Core1
Core2
Core3
Core4
MSHRs
MSHRs MSHRs MSHRs MSHRs
[RTAS16] Prathap Kumar Valsan (*), Heechul Yun, Farzad Farshchi (*). Taming Non-blocking Caches to Improve Isolati
on in Multicore Real-Time Systems. In IEEE Intl. Conference on Real-Time and Embedded Technology and Applications
Symposium (RTAS), 2016 (Best paper award. *: KU students)
Case Study
• 4 RT tasks (EEMBC) – One RT per core – Reserve 2 MSHRs – P: 20,30,40,60ms – C: ~8ms
• 4 NRT tasks – One NRT per core – Run on slack
• Up to 20% WCET reduction – Compared to cache partition only
14
References
• [RTCSA16] Prasanth Vivekanandan, Gonzalo Garcia, Heechul Yun, Shawn Keshmiri. A Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles. IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), IEEE, 2016 (Nominated for Best Student Paper)
• [RTAS16-1] Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi . Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems. In IEEE Intl. Conference on Real-Time and Embedded Technology and Applications Symposium (RTAS), 2016 (Best Paper Award.)
• [RTAS16-2] Rodolfo Pellizzoni, Heechul Yun. Memory Servers for Multicore Systems. IEEE Intl. Conference on Real-Time and Embedded Technology and Applications Symposium (RTAS), IEEE, 2016
• [TC16] Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Marco Caccamo, and Lui Sha. Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platforms, IEEE Transactions on Computers, Vol 65, Issue 2, 2015, pp. 562 – 576 (Editor's Pick of the year 2016)
• [CPSNA15] Prathap Kumar Valsan (*), Heechul Yun. MEDUSA: A Predictable and High-Performance DRAM Controller for Multicore based Embedded Systems IEEE Intl. Conference on Cyber-Physical Systems, Networks, and Applications (CPSNA) , 2015, pp. 86 – 93
• [ECRTS15-1] Heechul Yun, Rodolfo Pellizzoni, Prathap Kumar Valsan (*). Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems. IEEE Euromicro Conference on Real-Time Systems (ECRTS), 2015.
• [ECRTS15-2] Renato Mancuso, Rodolfo Pellizzoni, Marco Caccamo, Lui Sha and Heechul Yun. WCET(m) Estimation in Multi-Core Systems using Single Core Equivalence. Euromicro Conference on Real-Time Systems (ECRTS), 2015.
• [RTAS14] Heechul Yun, Renato Mancuso, Zheng Wu, Rodolfo Pellizzoni. PALLOC: DRAM Bank-Aware Memory Allocator for Performance Isolation on Multicore Real-Time Systems, IEEE Intl. Conference on Real-Time and Embedded Technology and Applications Symposium (RTAS). 2014, pp. 155-166.
15
On-going Projects
• Multicore Resource Management
– OS, architecture research for time predictability
– Funding Agencies: NSF, ETRI
• UAV Software Platform
– ROS (Robot Operating System) based autopilot and real-time sensor (radar and vision) processing
– Funding Agencies: NASA
16
Prospective Students
• Solid background in operating systems and computer architecture
• Good system programming skills
• Interests and experiences in building Intelligent cyber-physical systems – ROS, python, Linux, OpenCV, CUDA
– PID control, real-time sensor fusion
• Send me your CV and schedule a meeting
18