center for embedded systems research (cesr) department of electrical & computer eng’g
DESCRIPTION
Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing. Ali El-Haj-Mahmoud, Ahmed S. AL-Zawawi, Aravindh Anantaraman, and Eric Rotenberg. Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/1.jpg)
NC STATE UNIVERSITY
Center for Embedded Systems Research (CESR)Department of Electrical & Computer Eng’g
North Carolina State University
Ali El-Haj-Mahmoud, Ahmed S. AL-Zawawi, Aravindh Anantaraman, and Eric Rotenberg
Virtual Multiprocessor: An Analyzable, High-Performance
Microarchitecture for Real-Time Computing
![Page 2: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/2.jpg)
2El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Embedded Processor Trends
Inheriting desktop high-performance features Examples
• ARM11: 8-stage pipeline, caches, dynamic br. pred.• Ubicom IP3023: 8 hardware threads• PowerPC 750: 2-way superscalar, OOO execution
![Page 3: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/3.jpg)
3El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Real-Time Systems and Analyzability
Schedulability of task-set determined a priori• Requires worst-case execution times (WCET) statically analyzable microarchitecture
Dynamic microarchitecture features complicate real-time design
A trade-off between performance and analyzability
AB
![Page 4: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/4.jpg)
4El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Multiple Simple Processors
+ Analyzable+ Natural fit with real-time systems− Rigid resource partitioning− Higher cost/performance metric
proc 1 proc 2
A BC
![Page 5: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/5.jpg)
5El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Simultaneous Multithreading (SMT)
+ Flexible resource sharing+ Better cost/performance metric− Unanalyzable
SMT
A B C
![Page 6: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/6.jpg)
6El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Unanalyzability of SMT
− Violates single-task WCET assumption (tasks analyzed separately)
− Arbitrary periods arbitrary overlap of tasks− Dynamic interference
Cannot derive WCETs Cannot perform schedulability
![Page 7: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/7.jpg)
7El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Real-Time Virtual Multiprocessor Combine MP analyzability and SMT flexibility Key idea: Interference-free multithreading
• SMT performance• WCET of each task independent of task-set
RVMP substrate: two parts• Highly reconfigurable multithreaded superscalar
Space: multiple arbitrary interference-free partitionsTime: rapidly reconfigure partitions
• Static schedule orchestrates partitioning
![Page 8: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/8.jpg)
8El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Big Picture
Co-design processor and real-time scheduling for analyzable high-performance
![Page 9: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/9.jpg)
9El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
RVMP Architecture Superscalar “ways” are natural partitioning
granularity Different-sized virtual processors carved out
of single superscalar
![Page 10: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/10.jpg)
10El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Processor Architecture Starting point
• Alpha 21164: 4-way in-order superscalar• Ubicom IP3023: 8 hardware threads
(4 in RVMP 4 VPs)
Simplifications for analyzability• In-order issue within VPs• Software-managed scratchpads • Static branch prediction
Not limitations of RVMP!
![Page 11: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/11.jpg)
11El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Processor Architecture
FetchUnit
PC
InterleavedInstruction Scratchpad
DecodeSlotter and Scoreboard
(Issue Logic)
Int RF
ShadowBuffers
FP RF
Data Scratchpad
FU4: FPU
FU0: INT
FU1: INT/MUL/DIV
FU2: INT/AGEN
FU3: INT/AGEN
4 4
1
ShadowBuffers
RD
RDWR
HRT
Fetch Buffer
![Page 12: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/12.jpg)
12El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Fetch Buffer
InstructionScratchpad
FetchUnit
Decode Issue
Backend
FV
PV
Shadow Buffers
Shadow Buffers
FV
PV
HRT
Instruction Fetch
![Page 13: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/13.jpg)
13El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Real-Time Scheduling
Too complicated• Must schedule entire hyper-period• Overwhelming # of possible space/time schedules• High dedicated-storage cost for schedule
ABCD
![Page 14: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/14.jpg)
14El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
WCET2WCET1
Task A
WCET3WCET4
period
1234
# of
way
s
4 ways
3 ways
2 ways
1 way
…
…
…
…
…Round
![Page 15: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/15.jpg)
15El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Task Bperiod
1234
# of
way
s
4 ways
3 ways
2 ways
1 way
…
…
…
…
…
![Page 16: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/16.jpg)
16El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Task Cperiod
1234
# of
way
s
4 ways
3 ways
2 ways
1 way
…
…
…
…
…
![Page 17: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/17.jpg)
17El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Task Dperiod
1234
# of
way
s
4 ways
3 ways
2 ways
1 way
…
…
…
…
…
![Page 18: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/18.jpg)
18El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
FAILEDSUCCEEDEDCyclic schedule
![Page 19: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/19.jpg)
19El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Interaction: Scheduling and Architecture
HRT LTC
FU0FU1FU2FU3FU4
FU0FU1FU2FU3FU4
60
40
100 cycles
60 40
FV PV CVs
0
EOT
1
INVALIDINVALID
![Page 20: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/20.jpg)
20El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Experiments Tasks from C-lab and MiBench benchmarks 100 task-sets
• 4 tasks per task-set (also 8 tasks in paper)• grouped according to scalar utilization (U_scalar)
Two experiments• Worst-case schedulability analysis • Run-time experiments
![Page 21: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/21.jpg)
21El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Worst-Case Schedulability Tests
25 25 25 25
1614 15
5
16
97
2
7
1 1
25
911 10
20
25
9
1618
2325
18
24 24 2525
0%
25%
50%
75%
100%
Sca
lar
RV
MP
4x1
2x2
1x4
Sca
lar
RV
MP
4x1
2x2
1x4
Sca
lar
RV
MP
4x1
2x2
1x4
Sca
lar
RV
MP
4x1
2x2
1x4
0 < U_scalar <= 1 1 < U_scalar <= 2 2 < U_scalar <= 3 3 < U_scalar <= 4
Task-set bins
Succ
ess
rate
(%)
FailureSuccess
![Page 22: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/22.jpg)
22El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
RVMP Configurations
0
1
2
3
4
5
6
71-
1-1-
12-
21-
3/2-
21-
3/2-
1-1
4/1-
1-2
4/4/
2-2
4/4/
1-3
4/4/
4/4
1-1-
1-1
2-2
1-3/
2-2
1-3/
2-1-
14/
1-1-
24/
4/2-
24/
4/1-
34/
4/4/
4
1-1-
1-1
2-2
1-3/
2-2
1-3/
2-1-
14/
1-1-
24/
4/2-
24/
4/1-
34/
4/4/
4
1 < U_scalar <= 2 2 < U_scalar <= 3 3 < U_scalar <= 4
Task-set bins
# of
task
-set
s
![Page 23: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/23.jpg)
23El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Run-Time Experiments
25 25 25 25 25
1715 15
5
18 17 16
97
3
16
7
1
96
810 10
20
7 8 9
1618
22
911
18
24 24 25
1619
14
25
10%
25%
50%
75%
100%
RV
MP
4x1
2x2
1x4
SM
T-E
DF
SM
T-IC
NT
RV
MP
4x1
2x2
1x4
SM
T-E
DF
SM
T-IC
NT
RV
MP
4x1
2x2
1x4
SM
T-E
DF
SM
T-IC
NT
RV
MP
4x1
2x2
1x4
SM
T-E
DF
SM
T-IC
NT
0 < U_scalar <= 1 1 < U_scalar <= 2 2 < U_scalar <= 3 3 < U_scalar <= 4
Task-set bins
Succ
ess
rate
(%)
FailureSuccess
![Page 24: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/24.jpg)
24El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Unsafe Behavior of SMT
16
12
6
Only SMT-EDF successOnly SMT-ICNT successBoth successBoth failure
Failure40%
Success60%
1 < U_scalar <= 2
![Page 25: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g](https://reader036.vdocuments.mx/reader036/viewer/2022070419/56815b74550346895dc96e54/html5/thumbnails/25.jpg)
25El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Summary Novel contributions
• Virtualize a single processor− Space: variable-size interference-free partitions− Time: rapid reconfiguration
• Simple real-time scheduling approach Analyzability of MP with flexibility of SMT Co-design processor and real-time scheduling
for analyzable high-performance