non-uniform cache architecture
DESCRIPTION
Non-Uniform Cache Architecture Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech Guest lecture for ECE4100/6100 for Prof. Yalamanchili. Non-Uniform Cache Architecture. ASPLOS 2002 proposed by UT-Austin Facts Large shared on-die L2 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Non-Uniform Cache Architecture](https://reader036.vdocuments.mx/reader036/viewer/2022081515/568147b0550346895db4f204/html5/thumbnails/1.jpg)
Non-Uniform Cache Architecture
Prof. Hsien-Hsin S. LeeSchool of Electrical and Computer EngineeringGeorgia Tech
Guest lecture for ECE4100/6100 for Prof. Yalamanchili
![Page 2: Non-Uniform Cache Architecture](https://reader036.vdocuments.mx/reader036/viewer/2022081515/568147b0550346895db4f204/html5/thumbnails/2.jpg)
2
Non-Uniform Cache Architecture
• ASPLOS 2002 proposed by UT-Austin• Facts
– Large shared on-die L2– Wire-delay dominating on-die cache
3 cycles1MB
180nm, 1999
11 cycles4MB
90nm, 2004
24 cycles16MB
50nm, 2010
![Page 3: Non-Uniform Cache Architecture](https://reader036.vdocuments.mx/reader036/viewer/2022081515/568147b0550346895db4f204/html5/thumbnails/3.jpg)
3
Multi-banked L2 cache
Bank=128KB11 cycles
2MB @ 130nm
Bank Access time = 3 cyclesInterconnect delay = 8 cycles
![Page 4: Non-Uniform Cache Architecture](https://reader036.vdocuments.mx/reader036/viewer/2022081515/568147b0550346895db4f204/html5/thumbnails/4.jpg)
4
Multi-banked L2 cache
Bank=64KB47 cycles
16MB @ 50nm
Bank Access time = 3 cyclesInterconnect delay = 44 cycles
![Page 5: Non-Uniform Cache Architecture](https://reader036.vdocuments.mx/reader036/viewer/2022081515/568147b0550346895db4f204/html5/thumbnails/5.jpg)
5
Static NUCA-1
• Use private per-bank channel• Each bank has its distinct access latency• Statically decide data location for its given address • Average access latency =34.2 cycles• Wire overhead = 20.9% an issue
Tag Array
Data Bus
Address Bus
Bank
Sub-bank
Predecoder
Senseamplifier
Wordline driverand decoder
![Page 6: Non-Uniform Cache Architecture](https://reader036.vdocuments.mx/reader036/viewer/2022081515/568147b0550346895db4f204/html5/thumbnails/6.jpg)
6
Static NUCA-2
• Use a 2D switched network to alleviate wire area overhead• Average access latency =24.2 cycles• Wire overhead = 5.9%
Bank
Data bus
SwitchTag Array
Wordline driverand decoder
Predecoder
![Page 7: Non-Uniform Cache Architecture](https://reader036.vdocuments.mx/reader036/viewer/2022081515/568147b0550346895db4f204/html5/thumbnails/7.jpg)
7
Dynamic NUCA
• Data can dynamically migrate• Move frequently used cache lines closer to CPU
![Page 8: Non-Uniform Cache Architecture](https://reader036.vdocuments.mx/reader036/viewer/2022081515/568147b0550346895db4f204/html5/thumbnails/8.jpg)
8
Dynamic NUCA
• Simple Mapping• All 4 ways of each bank set needs to be searched• Farther bank sets longer access
8 bank setsway 0
way 1
way 2
way 3
one set
bank
![Page 9: Non-Uniform Cache Architecture](https://reader036.vdocuments.mx/reader036/viewer/2022081515/568147b0550346895db4f204/html5/thumbnails/9.jpg)
9
Dynamic NUCA
• Fair Mapping• Average access time across all bank sets are
equal
8 bank setsway 0
way 1
way 2
way 3
one set
bank
![Page 10: Non-Uniform Cache Architecture](https://reader036.vdocuments.mx/reader036/viewer/2022081515/568147b0550346895db4f204/html5/thumbnails/10.jpg)
10
Dynamic NUCA
• Shared Mapping• Sharing the closet banks for farther banks
8 bank setsway 0
way 1
way 2
way 3
bank