2012. 06. 13 miseon han thomas w. barr, alan l. cox, scott rixner rice computer architecture group,...
TRANSCRIPT
![Page 1: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/1.jpg)
SpecTLB: A Mechanism for Speculative Address
Translation
2012. 06. 13Miseon Han
Thomas W. Barr, Alan L. Cox, Scott RixnerRice Computer Architecture Group, Rice Uni-
versityISCA, June 2011
![Page 2: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/2.jpg)
Motivation
![Page 3: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/3.jpg)
http://compiler.korea.ac.kr
• Virtual memory– Performance overhead 5-14% for ‘typical’ applications [Bhargava08]– 89% under virtualization [Bhargava08]– Large pages not always a good solution
Virtual Memory: Still an increasing challenge
3
![Page 4: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/4.jpg)
http://compiler.korea.ac.kr
• What page size to pick?– 4KB, 2MB, 1GB on x86
• Can’t always use largest size– Wasted memory– increased I/O traffic
• Dynamic page size selection
Physical memory allocator – Large pages
4
![Page 5: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/5.jpg)
http://compiler.korea.ac.kr
• SpecTLB (Speculative TLB)– A hardware/software system
• Reservation-based physical memory allocator [Talluri94]– Allocate small pages by default to maintain fine-grained control
• Predict small page translations in hardware– Performance of large pages, control of small pages
Ideas
5
![Page 6: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/6.jpg)
Background
![Page 7: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/7.jpg)
http://compiler.korea.ac.kr
• Four-level radix-tree page table
X86-64 Page Table format
7
0x5c8315cc2016
[47:39] [38:30] [29:21] [20:12] [11:0] {0b9, 00c, 0ae, 0c2, 016}
{123, 016}
![Page 8: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/8.jpg)
http://compiler.korea.ac.kr
• Page table levels describe physical address space at different granularity
Large pages
8
512GB 1GB 2MB 4KB
![Page 9: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/9.jpg)
http://compiler.korea.ac.kr
• Reservation-based memory allocation [Talluri94]– Always allocate small pages in book-keeping entry at first– Place these small pages in a large page ‘reservation’
• if the handler decides that reservation is needed– Promote reservation to large page
• when all small pages in the reservation are allocated– Extended and implemented in FreeBSD [Navarro02]
• Default memory allocator
Reservation-based memory allocation
9
![Page 10: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/10.jpg)
http://compiler.korea.ac.kr
Reservation-based memory allocation
10
Handler reserves2MB region of physical space
![Page 11: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/11.jpg)
http://compiler.korea.ac.kr
Reservation-based memory allocation
11
Reservation is ‘promoted’ intoLarge page.
![Page 12: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/12.jpg)
http://compiler.korea.ac.kr
Reservation-based memory allocation
12
Reservations may not be filled.
![Page 13: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/13.jpg)
http://compiler.korea.ac.kr
Reservation based memory allocation
13
![Page 14: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/14.jpg)
SpecTLB
![Page 15: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/15.jpg)
http://compiler.korea.ac.kr
• TLB-like structure– Tracks reservations, not actual mappings– Detect reservations– Predict translations– Verify predictions
SpecTLB
15
![Page 16: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/16.jpg)
http://compiler.korea.ac.kr
Detecting reservations
16
{0b9, 00c, 0ae, 002, 313} {8002, 313}
Virtual Address Physical Address
{0b9, 00c, 0ae, 000, 000}
{8000, 000}Current Reservations:
{8000, 000}
![Page 17: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/17.jpg)
http://compiler.korea.ac.kr
Predicting translations
17
{0b9, 00c, 0ae, 005, 313} {8005, 313}?
Virtual Address Physical Address
{0b9, 00c, 0ae, 000, 000}
{8000, 000}Current Reservations:
{8000, 000}
?
![Page 18: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/18.jpg)
http://compiler.korea.ac.kr
• Provides predicted translations for pages within tracked reservations
• Predictions may be incorrect– Page table must still be walked
• Page walk can occur in parallel• Latency hidden
– Speculative translation can be used concurrently• Microarchitecture cancels speculative work
SpecTLB
18
![Page 19: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/19.jpg)
Simulation & Result
![Page 20: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/20.jpg)
http://compiler.korea.ac.kr
Benchmark TLB miss rate(/1k DRAM accesses)
Speculative Prediction frequency
Prediction Accuracy
DRAM Ac-cesses Overlapped
PostgreSQL 74.43 0.762 0.989 0.448
python 15.36 0.760 0.998 0.419
SPECjbb 20.04 0.418 0.971 0.310
bzip2 4.00 0.293 0.998 0.235
gcc 4.25 0.852 0.988 0.664
mcf 79.43 0.992 1.000 0.956
dc.B 42.29 0.083 0.353 0.073
ep.C 12.94 0.014 0.962 0.023
SpecTLB Results
20
Full system simulator, unmodified FreeBSD kernel
![Page 21: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/21.jpg)
http://compiler.korea.ac.kr
• SpecTLB and TLB prefetching hide the latency of TLB misses.– SpecTLB : large-page reservations. current TLB miss.– TLB prefetcher : access patterns, future TLB miss.
• Speculative work– SpecTLB : instructions are executed parallel with translation confirm.– TLB prefetcher : prefetch page table entries.
TLB Prefetcher Comparison
21
![Page 22: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/22.jpg)
http://compiler.korea.ac.kr
• Generally hidesfewer walks thanSpecTLB– Prefetcher does
well with high access regularity
TLB Prefetcher Comparison
22
Bench-mark
TLB miss rate
SpecTLB TLB Prefetcher
Post-greSQL
74.43 0.989 0.106
python 15.36 0.998 0.633
SPECjbb 20.04 0.971 0.151
bzip2 4.00 0.998 0.978
gcc 4.25 0.988 0.330
mcf 79.43 1.000 0.051
dc.B 42.29 0.353 0.190
ep.C 12.94 0.962 0.897
![Page 23: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011](https://reader036.vdocuments.mx/reader036/viewer/2022070411/56649f3a5503460f94c57dfa/html5/thumbnails/23.jpg)
http://compiler.korea.ac.kr
• SpecTLB hides latency of TLB misses– Predictions allow page walk to occur in parallel with speculative work– >62% of TLB miss latencies hidden for majority of benchmarks
Conclusions
23