opensparc – an open platform for hardware …...opensparc – an open platform for hardware...
TRANSCRIPT
![Page 1: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/1.jpg)
OpenSPARC – An Open Platform for Hardware Reliability Experimentation
Ishwar Parulkar and Alan Wood Sun Microsystems, Inc.
James C. Hoe and Babak FalsafiCarnegie Mellon University
Sarita V. Adve and Josep TorrellasUniversity of Illinois at Urbana-
ChampaignSubhasish Mitra
Stanford University
IEEE SELSE 4 - March 26, 2008
![Page 2: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/2.jpg)
2IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Outline
1.Chip Multi-threading (CMT)
2.OpenSPARC T2 and T1 processors
3.Reliability in OpenSPARC processors
4.What is available in OpenSPARC
5.Current university research using OpenSPARC
6.Future research directions
![Page 3: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/3.jpg)
3IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
World's First 64-bit Open Source Microprocessor
OpenSPARC.net Governed by GPLv2
Complete processor architecture & implementation
Register Transfer Level (RTL) Hypervisor API Verification suite and
architectural models Simulation model for operating
system bringup on s/w
![Page 4: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/4.jpg)
4IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Instruction-level Parallelism
Thread-level Parallelism
Instruction/DataWorking Set
Data Sharing
Low Low Low LowMedium High
High High High High High
Large Large Medium Large Large
Low Medium High Medium High Medium
Chip Multithreading (CMT)
![Page 5: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/5.jpg)
5IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Memory BottleneckRelative Performance
10000
11990 1995 2005 1980
1000
100
10
1985 2000
Gap
CPU Frequency
DRAM Speeds
Source: Sun World Wide Analyst Conference Feb. 25, 2003
CPU -- 2x Every 2 Years
DRAM -- 2x Every 6
Years
![Page 6: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/6.jpg)
6IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Single Threaded Performance
Single Threading
Thread
Memory Latency Compute
Time
HURRYUP ANDWAIT!
C C C
Typical Processor Utilization:15–25%
M M M
Up to 85% Cycles Waiting for Memory
![Page 7: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/7.jpg)
7IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Single Threaded Performance Chip Multi-threaded
(CMT) Performance
The Power of CMT
UltraSPARC T1 core Processor Utilization: Up to
85%
C MC MC MThread 1
Memory Latency ComputeTime
C MC MC M
C MC MC M
C MC MC M
Thread 2
Thread 3
Thread 4
![Page 8: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/8.jpg)
8IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Chip Multi-Threading (CMT)
CMP (chip multiprocessing)
HMT (hardware multithreading)
CMT (chip
multithreading)
n cores per processor m threads per core n x m threads per processor
![Page 9: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/9.jpg)
9IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
CMT Paradigm Shift!
> Higher reliability> Better performance> Lower cost> Faster Installation> More efficient energy use> Lower HVAC cost> Faster time-to-repair> ... and more
CMT technology allows simple, compact system designs, which deliver:
Everybody has changed to multi-core (CMP) and/or chip multi-threaded (CMT) processors: Sun(CMT), IBM(CMT), Intel(CMP), AMD(CMP)
![Page 10: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/10.jpg)
10IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Instruction-level Parallelism
Thread-level Parallelism
Instruction/DataWorking Set
Data Sharing
Low Low Low LowMedium High
High High High High High
Large Large Medium Large Large
Low Medium High Medium High Medium
UltraSPARC T2 and T1CMT Processors
![Page 11: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/11.jpg)
11IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
UltraSPARC T2Die Photo
8 SPARC cores, 8 threads each
Shared 4MB L2, 8 banks, 16-way associative
Four dual-channel FBDIMM memory controllers
Two 10/1 Gb Enet ports w/onboard packet classification and filtering
One PCI-E x8 port
Cryptograhic coprocessor on chip
1831 pins, 711 signal I/0
342mm2 die in 65nm
![Page 12: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/12.jpg)
12IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
UltraSPARC T2Block Diagram
![Page 13: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/13.jpg)
13IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
UltraSPARC T2
![Page 14: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/14.jpg)
14IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
UltraSPARC T2 Reliability Extensive error detection and correction
Parity protection on I$, D$ tags and data, ITLB, DTLB, CAM and data, modular arithmetic, store address buffer
ECC on integer RF, floating point RF, store data buffer, trap stack, L2$ and other internal arrays
Combination of hardware and software correction flows Hardware re-fetch for I$ and D$ Software recovery for other errors Offlining of a thread, group of threads or physical core
Hardware error injection for verification Selective disabling of detection and
reporting for bringup
![Page 15: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/15.jpg)
15IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Single-Core Processor
(Not to Scale)
C1
C2
C3
C4
C5
C6
C7
C8
Faster Can Be Cooler (1)
107C
102C
96C
91C
85C
80C
74C
69C
63C
58C
UltraSPARC T2 Reliability
![Page 16: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/16.jpg)
16IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Single-Core Processor T2 Processor
(Not to Scale)
C1
C2
C3
C4
C5
C6
C7
C8
107C
102C
96C
91C
85C
80C
74C
69C
63C
58C
UltraSPARC T2 ReliabilityFaster Can Be Cooler (2)
![Page 17: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/17.jpg)
17IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Instruction-level Parallelism
Thread-level Parallelism
Instruction/DataWorking Set
Data Sharing
Low Low Low LowMedium High
High High High High High
Large Large Medium Large Large
Low Medium High Medium High Medium
OpenSPARC
![Page 18: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/18.jpg)
18IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
OpenSPARC Communities
Chip Designers
Hardware IP Suppliers
EDA Vendors
CMT Tools
Academia/Universities
Operating Systems
BenchmarkingReference flowFPGAEmulationVerificationPhysical DesignMulti-threaded tools
Architecture, ISA, VLSI course workThreading, Scaling, ParallelizationBenchmarks
PCI cores, SERDES etc.
Compilers, ThreadingOptimizationPerformance Analysis
OpenSolaris,Linux, BSD variants,Embedded OSs
SoC designs, Hard macrosTelecom applications
![Page 19: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/19.jpg)
19IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
What's Available in OpenSPARC1. Chip design and verification UltraSPARC Architecture 2005 spec UltraSPARC T2/T1 implementation spec Full RTL (Verilog) of OpenSPARC T2/T1
(8 cores, 64/32 threads – more than 4 million lines of code!) Verification test suites Full OpenSPARC simulation environment Synthesis scripts for RTL FPGA implementation support
Reduced (to fit capacity), synthesizable version of RTL Synplicity scripts for FPGA synthesis
![Page 20: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/20.jpg)
20IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
What's Available in OpenSPARC2. Architecture and performance modeling
SAM – SPARC Architectural Model (including source code)
Legion – Instruction accurate simulator (incl. source code)
OBP – Open Boot PROM source code Hypervisor source code Solaris images for simulation RST Trace Tool – trace format for SPARC
instruction-level traces
![Page 21: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/21.jpg)
21IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
What's Available in OpenSPARC3. Tools for tuning and debug ATS – Binary reoptimization and recompilation
tool for tuning and troubleshooting applications Corestat – Online monitoring of core and FPU
utilization Discover – Runtime detection of programming
errors in allocating and using program memory Thread Analyzer – Checking of multi-threaded
programming errors such as data races and deadlocks
More...
![Page 22: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/22.jpg)
22IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
What's Available in OpenSPARC4. Tools for software developers Sun Studio 12 – C, C++, Fortran compilers for
Solaris/Linux combined with Netbeans, etc. BIT – Binary Improvement Tool analyzes and
optimizes SPARC binaries for performance and code coverage
SPOT – produces detailed report on conditions that impact performance of an application
Source code analysis tool to identify incompatible APIs between Solaris and Linux to speed up migration
More...
![Page 23: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/23.jpg)
23IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Instruction-level Parallelism
Thread-level Parallelism
Instruction/DataWorking Set
Data Sharing
Low Low Low LowMedium High
High High High High High
Large Large Medium Large
Low Medium High Medium High Medium
University research in hardware reliability using
OpenSPARC
![Page 24: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/24.jpg)
24IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Mem WritebackExDecode
Decode ALUD-
Cache
RegFilex4
StoreBuffer
Hash Queue
FP Match Com
pare
ToL2
Problem: Error detection for the processor pipeline ( soft, wearout, … )
Solution: Architectural fingerprints Summarize retiring architectural updates into compact hash (regs, stores) Periodically compare hash with reference (another core, previous execution)
Results: Multithreaded OpenSPARC T1 RTL implementation — less than 4% area
overhead Scalable to wide-issue superscalar BW Soft fault injection: effective detection for errors propagated to arch. state
0.00.20.40.60.81.0
byp exu fcl fdp lsu swl tlu FullSPARC
Frac
t. ar
ch. e
rror
s
Silent Data Corruption Hang Loop
Architectural Fingerprints
Prof. Hoe and Prof. Falsafi @Carnegie Mellon University
![Page 25: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/25.jpg)
25IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Problem: Detecting device wearout during soft breakdown stage Faults initially hidden by guardbands & masking
Solution: Periodically test processor cores for signs of growing wearout Reduce freq./voltage guardbands until marginal Test w/Arch. or Arch. fingerprintsμ Observe fails at incr. conservative conditions
Results: Wearout fault injection in OpenSPARC Arch. and Arch. fingerprintsμ
equivalent for wide-spread wearout Arch. needed for isolatedμ wearout
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200Stress past guardband (ps)
Fra
c. F
ails
det
ecte
d
ArchμArchTimeout
FIRST – Detecting Emerging Wearout Faults
Prof. Hoe and Prof. Falsafi @Carnegie Mellon University
![Page 26: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/26.jpg)
26IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
• Detection: Software symptoms, minimal backup hardware • Recovery: Software/hardware checkpoint and rollback• Diagnosis: Firmware-controlled rollback/replay on multicore• Repair/reconfiguration: Redundant, reconfigurable hardware
Fault Error Symptomdetected
Recovery
Diagnosis Repair
Chkpoint Chkpoint
SWAT – SoftWare Anomaly Treatment
Prof. S. Adve, V. Adve and Y. Zhou @University of Illinois at U-C
Always-on, zero or low cost
May have high overhead, rarely invoked
Low cost solutions needed for in-field detection, diagnosis, recovery and repair for failures due to aging, soft errors inadequate burn-in, design defects, …
SWAT Framework Components
Motivation
![Page 27: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/27.jpg)
27IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Status Detection techniques with > 95% coverage for most structures
[ASPLOS’08, SELSE’08, DSN’08] Microarchitecture level, firmware-driven diagnosis with > 97%
coverage [SELSE’08, DSN’08] So far, used microarchitecture-level fault injection in simulation
Ongoing/future work with OpenSPARC Gate-level fault modeling Hypervisor implementation
SWAT – Status and Ongoing Work
Prof. S. Adve, V. Adve and Y. Zhou @University of Illinois at U-C
![Page 28: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/28.jpg)
28IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Goals Understand how gate level faults propagate to microarch & s/w Abstract fault models at microarchitecture level Evaluate reliability solutions and validate results
Methodology Perform fault injections at gate level For better simulation speed
Hierarchical integration of microarchitecture level full system simulator with lower-level simulation of faulty unit
Using OpenSPARC Verilog model
SWAT – Ongoing WorkHigh-level fault models and validation
Prof. S. Adve, V. Adve and Y. Zhou @University of Illinois at U-C
![Page 29: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/29.jpg)
29IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Plan to use OpenSPARC hypervisor to prototype and evaluate firmware part of SWATMethodology
Leverage, extend interface between hypervisor/hardware and hypervisor/OS
Extend hypervisor for functionality Use for error detection, recovery, diagnosis, repair
SWAT – Future WorkHypervisor implementation
Prof. S. Adve, V. Adve and Y. Zhou @University of Illinois at U-C
![Page 30: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/30.jpg)
30IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
VARIUS – Process Parameter Variation
Problem: Parameter variation in present and future multicore chips
Goals: Model parameter variation and resulting timing errors Design multicore microarchitectures to detect and tolerate
variation-induced errors Develop new microarchitectural techniques to mitigate
variation and variation-induced errors.
Prof. Torrellas @University of Illinois at U-C
![Page 31: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/31.jpg)
31IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
VARIUS – Process Parameter VariationAccomplishments
VARIUS model of parameter variation and resulting timing errors for microarchitects [TSM08]
ReCycle: Pipeline rebalance under process variation [ISCA07]
Fine-grain adaptive body bias (ABB) to mitigate variation in multicores [MICRO07]
Workload scheduling and DVFS power management in multicores under variation [ISCA08]
Paceline: Core pairing for reliability under process variation [PACT07]
Prof. Torrellas @University of Illinois at U-C
![Page 32: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/32.jpg)
32IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
VARIUS – Process Parameter VariationUsing OpenSPARC
Goal: Get insights into the effect of parameter variation on a real processor Measure the distribution of the path delays Apply the variation model
Evaluation Flow: Synopsys dc_shell-t:
compile RTL to gate-level netlist Cadence SOCEncounter
Floorplan, Placement, Routing, Timing analysis Synopsys Primetime
Static timing analysis & timing debugging Cadence NCSim
Simulation
Compile RTL (dc_shell-t)
Design entry
SOCEncounter
RTL & Timing Constraints &Library
Netlist & Timing Constraints &Physical library
Primetime
Placement & Timing report &Routing
Netlist & Timing info
NCSim
Prof. Torrellas @University of Illinois at U-C
![Page 33: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/33.jpg)
33IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
CASP – Concurrent Autonomous Chip Self-test using Stored PatternsMotivation
33
WearoutInfant mortality Normal lifetimeTime
Failure rate
Burn-in difficult
Circuit agingdominant
Solution: EXTREMELY THOROUGH
online self-test
Soft errors: effective techniques exist Prof. Mitra @Stanford University
![Page 34: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/34.jpg)
34IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
CASP – Test Flow
34
Core N normal
operation
Schedule test on
next core
Core 4 resume
operation
Core N normal
operation
Core 4 temporarily
isolated
Core N normal
operation
Prepare core for
test
Core 4 selected for
test
Core 4
under test
Core N normal
operation
Thorough scan &
functional testing;
recovery if failed
Test Scheduling Pre-processing
Test Application Post-processing
Bring core from
test to normal
operation
... ...
......
Prof. Mitra @Stanford University
![Page 35: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/35.jpg)
35IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
OpenSPARC Modifications for CASP
35
8 processor
cores
Modified for
CASP support
Cross-bar
Switch
Modified for CASP support
L2Cache
FPU
DRAMControl
Jbus Interface
on-chip buffer
(7.5KB)
CASP control
CASP off-chip Storage (52MB)
CASP Controller
On-chip buffer for scan test data
Architectural modfications
➢ Before a core is tested➢ stalling/draining pipeline➢ disabling communication with
core under test➢ saving critical state➢ invalidating D$
➢ After a core is tested➢ restoring critical state➢ enabling communication with core
under test➢ restarting pipeline
● 8000 lines of new Verilog code
● Verification regression used to simulate normal operation of chip
Prof. Mitra @Stanford University
![Page 36: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/36.jpg)
36IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Instruction-level Parallelism
Thread-level Parallelism
Instruction/DataWorking Set
Data Sharing
Low Low Low LowMedium High
High High High High High
Large Large Medium Large
Low Medium High Medium High Medium
Future research possibilities in hardware reliability using
OpenSPARC
![Page 37: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/37.jpg)
37IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Future research possibilities Using CMT hardware resources for error
detection and recovery cores, threads, structures used by cores/threads
Understanding errors in the context of CMT architectural constructs thread arbitration and scheduling speculative threading
Validate error management solutions using a state-of-the-art microprocessor design
![Page 38: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/38.jpg)
38IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Future research possibilities Study impact of reliability solutions on
microprocessor performance use performance tools available in OpenSPARC
Firmware and software solutions for hardware reliability FPGA implementation and T1000/2000 servers with
Solaris/Hypervisor source and other tools Study impact of error detectors in processor
on chip level and application failure rates enable error detection selectively, use simulators
Several more...
![Page 39: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/39.jpg)
39IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Conclusions
OpenSPARC is an open source community based around UltraSPARC T1 and T2 CMT microprocessors
OpenSPARC provides a rich, state-of-the-art infrastructure for research in hardware reliability
Many universities are actively using OpenSPARC in their research, with a lot of success
There is a lot more research in hardware reliability that can be done using OpenSPARC
![Page 40: OpenSPARC – An Open Platform for Hardware …...OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e926980ecd29a6923009bd1/html5/thumbnails/40.jpg)
40IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Acknowledgment
We would like to acknowledge the students (past and present) from Carnegie Mellon University, University of Illinois at U-C and Stanford University who contributed to the research described in this presentation.