![Page 1: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/1.jpg)
Concurrent Autonomous Self-Test for Uncore Components in SoCs
Yanjing Li, Stanford University
Onur Mutlu, Carnegie Mellon University
Donald S. Gardner, Intel Corporation
Subhasish Mitra, Stanford University
1
![Page 2: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/2.jpg)
Overcoming CMOS Reliability Challenges
2
Circuit agingEarly-life failures LifetimeTime
Failure rate
Burn-in difficult
Guardbands expensive
On-line self-test and diagnostics
Soft errorsBuilt-In Soft Error
Resilience (BISER)
![Page 3: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/3.jpg)
Uncore Components Significant in SoCs
Cisco Network Processing Engine
Uncore
Components
Uncore
Components
NVIDIA Tegra
Uncore Components
Uncore Components
IBM Power 7
© techvishal.wordpress.com
© news.cnet.com
© ciscosistemas.org
Uncore examples
Controllers for cache & DRAM
Crossbar
I/O interfaces
3
![Page 4: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/4.jpg)
Robust Uncore Essential
Uncore12%
Processor cores12%
Memories76%
New on-line self-test for uncore
CASP for processor cores [Li DATE 08, ICCAD 09]
ECC, Memory BIST & repair for memories
8-cores 64-threads
OpenSPARC T2 SoC
© opensparc.net
Uncore
4
![Page 5: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/5.jpg)
Challenge 1: High Test Coverage
CASP Logic BIST Roving Emulation
Coverage High ? Depends
Cost Low High High
Design effort Moderate High High
CASP: Concurrent, Autonomous, Stored Patterns
High-coverage patterns off-chip FLASH
System-level on-line test access
FLASH cheap, test compression pervasive
5
![Page 6: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/6.jpg)
© intel.com
Challenge 2: Power, Performance, Area Costs
Stall-and-test inadequate 4-core Intel® Core™ i7 system results
On-line self-test
Requests from multiple cores
DRAM Controller
Core
Caches and Interconnects
Core Core Core
Unresponsiveness or system hang
Multiple cores stall
6
![Page 7: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/7.jpg)
Naïve Approaches Inadequate for Uncore
Stall-and-test
Unresponsiveness or complete hang
Spare unit for each uncore type
12% area overhead*
Small area cost
Small performance
impact
Uncore CASP new techniques required
* OpenSPARC T2 design 7
![Page 8: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/8.jpg)
New Uncore On-line Self-Test Principles
I. Resource reallocation and sharing (RRS)
II. No-performance-impact testing
III. Smart backup
< 1% area impact, < 3% performance impact
©opensparc.net
OpenSPARC T2 SoC
8
![Page 9: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/9.jpg)
I. Resource Reallocation and Sharing (RRS)
Components with “similar” functionality in SoCs
Temporary reallocation and sharing
Small performance hit without replication
©opensparc.net
4 cores
On-line self-test4. Reroute
Crossbar blocks
CASP controller
L2 banks
4 cores
2. Transfer dirty lines
3. Invalidate
1. Stall and drain requests
OpenSPARC T2
9
![Page 10: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/10.jpg)
II. No-Performance-Impact Testing
©opensparc.net
4 cores
On-line self-test
RRS
CASP controller
L2 banks
4 cores
OpenSPARC T2
IDLE
Implication-relations among SoC components
Component(s) tested when idle
During test of another component
Crossbar blocks
10
![Page 11: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/11.jpg)
III. Smart Backup
DMA for network
DMA for disks
I/O interface
Support in smart backup
Stall or handle slowly via
Programmed I/O
Programmed
I/O
Operations with different requirements
Backup unit for performance-critical operations
Absolute minimal additional hardware
OpenSPARC T2
11
![Page 12: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/12.jpg)
Application Performance Impact Memory-centric
I/O-centric on 4-core Intel system
Disk access: 3% impact
Uncore CASP emulated
4-core Intel® Core™ i7
© intel.com
Execution time
impact
PARSEC benchmarks
No visible unresponsiveness
1.5% performance impact
12
![Page 13: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/13.jpg)
Area and Power Impact
CASP controller(< 0.01% area)
OFF-CHIP FLASH
200 MB On-chip buffer(8KB)
Uncore on-line self-test principles applied
© opensparc.net
Minimal area impact: < 1%
Minimal power impact: < 1%13
![Page 14: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/14.jpg)
Test Results for Uncore Components
200 MB off-chip FLASH
10X test compression
7 ms – 300 ms test time per component
Total pattern count Test coverage
Stuck-at 5,577 99.2% - 99.9%
Transition 11,049 92.8% - 97.8%
Inexpensive FLASH
Thorough on-line self-test14
![Page 15: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/15.jpg)
Logic BISTConcurrent BIST
[Saluja IEEE TCAD 88]
Uncore CASP [This work]
CoverageHigh with high
costsDepends High
Area Cost
HighHigh costs possible
Low
Design complexity
Moderate
Performance impact
Low with our uncore
principlesLow Low
Uncore CASP vs. Existing Techniques
15
![Page 16: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/16.jpg)
CASP Applicable for Other SoCs
Cisco Network Processing EngineNVIDIA Tegra
IBM Power 7 I. RRS
II. No-performance-impact testing
III. Smart backup
IV. Core CASP
© techvishal.wordpress.com
© news.cnet.com
© ciscosistemas.org16
![Page 17: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/17.jpg)
CASP adaptive on-line self-test & diagnostics
3 new principles for uncore CASP
I. Resource reallocation and sharing (RRS)
II. No-performance-impact testing
III. Smart backup
Effective and practical
High test coverage
1% power, 3% performance, 1% area
Conclusions
17
![Page 18: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/18.jpg)
18
Backup Slides
![Page 19: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/19.jpg)
CASP on Actual Intel® Core™ i7 System Intel Research collaboration
Quad-core Intel® Core™ i7 (3.2 GHz)
Thermoelectric temperature controller
Debug tool
Unique real-life experiment
Development of adaptive self-diagnostics
Debut Tool Adapter
TemperatureController
19
![Page 20: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/20.jpg)
20
CASP Flow
4. Resume operationScan chain
3. Apply / analyze high-quality test patterns
(test compression, at-speed test…)
1. Select uncore or core component
2. Isolate
SoC with CASP controller(mulit-core SoC proliferation)
Inexpensive off-chip FLASH(non-volatile storage technology)
![Page 21: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/21.jpg)
RRS Example: L2 Cache Banks
3b. Transfer necessary states (dirty blocks)
Write-backto main memory if necessary
Crossbar
DRAM Controller 0
Bank 0(under test)
DataTagetc.
Controller
1. Stall cache controller
2. Drain outstanding requests
3a. Invalidate clean blocks; Invalidate directory; Invalidate L1
4. Route packets with destination {bank 0, bank 1} to bank 1
Bank 1(helper)
Controller
DataTagetc.
…
21
![Page 22: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/22.jpg)
22
No-Performance-Impact Testing Example: CCX (Crossbar)
8 cores , 64 threads
L2 Bank 0 L2 Bank 7
CCX: multiplexers and arbitration logic 0
CCX: multiplexers and arbitration logic 7
Separate scan chains
Separate scan chains
Packets reallocated to helper
Test at the same time
…
![Page 23: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/23.jpg)
23
Smart Backup Example: Non-Cachable Unit
5. Select outputs from backup
3.Turn onReset
4. Transfer states
MUX
PIO
Boot ROM
interface
1. Stall2. Drain outstanding requests
Interrupt status table
Interrupt processing
Config. status
register interface
Original (under test)
PIO
Interrupt processing
Backup
Minimize area costs at acceptable performance impact
![Page 24: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/24.jpg)
Naïve Approaches Inadequate for Uncore
Simple stall-and-test technique
OS timer interrupt handler on core i
DRAM controller
Request to DRAM
Under testStall
Demonstration on actual 4-core Intel® Core™ i7 system
Infrequent Test
Noticeable unresponsiveness
Frequent Test
System hang
Identical backup units: 12% area overhead
OS timer interrupt handler on core 1
Stall
…
24
![Page 25: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/25.jpg)
Performance Impact
Simulated Latency Overhead (PARSEC Benchmark Suite)
Tool: GEMS simulator (modified for RRS)
Workload: PARSEC benchmark suite
4 threads on 4 cores, CASP runs 1 sec. every 10 sec.
25
![Page 26: Concurrent Autonomous Self-Test for Uncore Components in SoCs Yanjing Li, Stanford University Onur Mutlu, Carnegie Mellon University Donald S. Gardner,](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ea05503460f94ba27f7/html5/thumbnails/26.jpg)
III. Smart Backup
DMA for network
DMA for disks
I/O interface
Support in smart backup
Stall or handle slowly via
Programmed I/O
Programmed
I/O
Operations with different requirements
Backup unit for performance-critical operations
Absolute minimal additional hardware
OpenSPARC T2 Ethernet port interface
Layers 3 and 4 acceleration
Network interface
Support in smart backup
OSorchestration
Layer 2
packet process
OpenSPARC T2
26