ultrasparc iv
DESCRIPTION
UltraSparc IV. Tolga TOLGAY. OUTLINE. Introduction History What is new? Chip Multitreading Pipeline Cache Branch Prediction Conclusion. INTRODUCTION. Sparc = Scalable Processor Architecture Open processor architecture SUN UltraSparc v9: RISC Architecture 64 bit address and data - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/1.jpg)
UltraSparc IVUltraSparc IVTolga TOLGAYTolga TOLGAY
![Page 2: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/2.jpg)
OUTLINEOUTLINE
IntroductionHistoryWhat is new?Chip MultitreadingPipelineCacheBranch PredictionConclusion
IntroductionHistoryWhat is new?Chip MultitreadingPipelineCacheBranch PredictionConclusion
![Page 3: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/3.jpg)
INTRODUCTIONINTRODUCTION
Sparc = Scalable Processor Architecture
Open processor architectureSUN UltraSparc v9:
RISC Architecture64 bit address and dataSuperscalar
Sparc = Scalable Processor Architecture
Open processor architectureSUN UltraSparc v9:
RISC Architecture64 bit address and dataSuperscalar
![Page 4: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/4.jpg)
HISTORYHISTORY
Begin developing Sparc – 1984 First Sparc Processor – 1986 SuperSparc – 1992 UltraSparc I – 1995 UltraSparc II – 1997 UltraSparc III – 2001 UltraSparc IV – 2004UltraSparc IV – 2004 UltraSparc IV+ – 2005 UltraSparc T1 – 2005
Begin developing Sparc – 1984 First Sparc Processor – 1986 SuperSparc – 1992 UltraSparc I – 1995 UltraSparc II – 1997 UltraSparc III – 2001 UltraSparc IV – 2004UltraSparc IV – 2004 UltraSparc IV+ – 2005 UltraSparc T1 – 2005
![Page 5: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/5.jpg)
WHAT IS NEW?WHAT IS NEW?
What UltraSparc IV offers new : CMT (Chip Multithreading)
New registers added due to CMT enhancementMCU registers, Sun Fireplan Interconnect
registers are shared.Enhancements on Floating Point Unit16 MB L2 cache with 128 byte line-size
shared by two processors.L2 caches uses LRU replacement strategyNew write-cache indexing-hashing feature
What UltraSparc IV offers new : CMT (Chip Multithreading)
New registers added due to CMT enhancementMCU registers, Sun Fireplan Interconnect
registers are shared.Enhancements on Floating Point Unit16 MB L2 cache with 128 byte line-size
shared by two processors.L2 caches uses LRU replacement strategyNew write-cache indexing-hashing feature
![Page 6: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/6.jpg)
Chip Multitreading (CMT)Chip Multitreading (CMT)
Two UltraSparc III cores into one die.
Two mirrored cores share :System busDRAM controllerOff-die L2 cacheFireplan registers.
Also called Chip Multiprocessing
Two UltraSparc III cores into one die.
Two mirrored cores share :System busDRAM controllerOff-die L2 cacheFireplan registers.
Also called Chip Multiprocessing
![Page 7: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/7.jpg)
Chip MultitreadingChip Multitreading
![Page 8: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/8.jpg)
Chip MultitreadingChip Multitreading
Aim is to increase performance without increasing clock speed.
Mirroring the cores cause a hot spot of floating point units.
How to avoid hot spot : Heat towers in copper interconnect
Aim is to increase performance without increasing clock speed.
Mirroring the cores cause a hot spot of floating point units.
How to avoid hot spot : Heat towers in copper interconnect
![Page 9: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/9.jpg)
Chip MultitreadingChip Multitreading
![Page 10: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/10.jpg)
CoreCore
More core improvements:Improved instruction fetch and store
bandwidth.Improved data prefetchingFPU can handle more unexpected
and underflow cases so reducing exceptions.
On-die cache enhanced with a hashed index to better handle multiple writes.
More core improvements:Improved instruction fetch and store
bandwidth.Improved data prefetchingFPU can handle more unexpected
and underflow cases so reducing exceptions.
On-die cache enhanced with a hashed index to better handle multiple writes.
![Page 11: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/11.jpg)
PipelinePipeline
Because UltraSparc IV contains two UltraSparc III cores, it uses the same pipeline.
4-way superscalar architecture.14-stage pipeline.
Because UltraSparc IV contains two UltraSparc III cores, it uses the same pipeline.
4-way superscalar architecture.14-stage pipeline.
![Page 12: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/12.jpg)
Pip
elin
e S
tag
es
Pip
elin
e S
tag
es
![Page 13: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/13.jpg)
Pipeline StagesPipeline Stages
Pipeline Stage Definition
A Address Generation
P Preliminary Fetch
F Fetch Intructions from I-Cache
B Branch Target Computation
I Instruction Group Formation
J Grouping
R Register Access
E Execute
C Cache
M Miss Detect
W Write
X Extend
T Trap
D Done
![Page 14: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/14.jpg)
Pipeline StagesPipeline Stages
![Page 15: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/15.jpg)
Pipeline StagesPipeline Stages
Stage A : Address Generation Generates and selects the fetch address Address can be selected from several sources
Stage P : Preliminary Fetch Starts fetching from I-Cache Accesses to Branch Predictor
Stage F : Fetch Second half of I-Cache access At the end of stage 4 instructions may be
latched Stage B : Branch Target Computation
Analyzes the instructions Calculate branch target address
Stage A : Address Generation Generates and selects the fetch address Address can be selected from several sources
Stage P : Preliminary Fetch Starts fetching from I-Cache Accesses to Branch Predictor
Stage F : Fetch Second half of I-Cache access At the end of stage 4 instructions may be
latched Stage B : Branch Target Computation
Analyzes the instructions Calculate branch target address
![Page 16: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/16.jpg)
Pipeline StagesPipeline Stages
Stage I : Instruction Group FormationInstructions are grouped into instruction
queue.Stage J : Instruction Group Staging
A group of instructions are dequeued and sent to R-Stage
Stage R : Dispatch and Register AccessDependency calculationDependency solution
Stage I : Instruction Group FormationInstructions are grouped into instruction
queue.Stage J : Instruction Group Staging
A group of instructions are dequeued and sent to R-Stage
Stage R : Dispatch and Register AccessDependency calculationDependency solution
![Page 17: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/17.jpg)
Pipeline StagesPipeline Stages
Stage E : Integer Instruction ExecutionFirst stage of execution pipelinesInteger instructions -> A0 and A1
pipelinesBranch instructions -> Branch pipelineOther instructions -> MS pipeline
Stage C : CacheInteger pipelines write results backSIU results are producedFirst stage for Floating Point Instructions
Stage E : Integer Instruction ExecutionFirst stage of execution pipelinesInteger instructions -> A0 and A1
pipelinesBranch instructions -> Branch pipelineOther instructions -> MS pipeline
Stage C : CacheInteger pipelines write results backSIU results are producedFirst stage for Floating Point Instructions
![Page 18: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/18.jpg)
Pipeline StagesPipeline Stages
Stage M : Miss Data cache misses are determined Second step for FP instructions
Stage W : Write MS pipeline results are written Third step for FP instructions D-cache miss requests send to L2 cache
Stage X : Extend Final step for Floating Point instructions Results from FP instructions are ready for
bypass
Stage M : Miss Data cache misses are determined Second step for FP instructions
Stage W : Write MS pipeline results are written Third step for FP instructions D-cache miss requests send to L2 cache
Stage X : Extend Final step for Floating Point instructions Results from FP instructions are ready for
bypass
![Page 19: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/19.jpg)
Pipeline StagesPipeline Stages
Stage T : TrapTraps are signalledAfter trap, instructions invalidate results
Stage D : DoneInteger results are written into
architectural register fileFloating point results are written to
floating point register file.Results became visible to any traps
generated from younger instructions.
Stage T : TrapTraps are signalledAfter trap, instructions invalidate results
Stage D : DoneInteger results are written into
architectural register fileFloating point results are written to
floating point register file.Results became visible to any traps
generated from younger instructions.
![Page 20: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/20.jpg)
Pipeline RulesPipeline Rules
Grouping rules :Group : collection of instructions that
does not limit eachother to be executed in parallel
Made before R-stageNeeded for :
The execution order is maintainedEach pipeline runs a subset of instructionsInstructions may require helpers
Execution order : in – order execution
Grouping rules :Group : collection of instructions that
does not limit eachother to be executed in parallel
Made before R-stageNeeded for :
The execution order is maintainedEach pipeline runs a subset of instructionsInstructions may require helpers
Execution order : in – order execution
![Page 21: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/21.jpg)
Cache OrganizationCache Organization
Doubled cache size because of dual core.Data Cache : 64 KB x 2Instruction Cache : 32 KB x 2L2 Cache : 16 MB, off-chip, sharedNo L3 Cache
Doubled cache size because of dual core.Data Cache : 64 KB x 2Instruction Cache : 32 KB x 2L2 Cache : 16 MB, off-chip, sharedNo L3 Cache
![Page 22: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/22.jpg)
Cache OrganizationCache Organization
![Page 23: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/23.jpg)
Cache OrganizationCache Organization
Data Cache64 KB Level 1 cache per core
Instruction Cache32 KB Level 1 cache per core4 – way associative
Data Cache64 KB Level 1 cache per core
Instruction Cache32 KB Level 1 cache per core4 – way associative
![Page 24: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/24.jpg)
Cache OrganizationCache Organization
Prefetch CacheOne of L1 caches2 Kbyte SRAM : 32 x 64 bytesUses LRU replacement algorithmAim is to fetch data before neededReduces main memory access latency2 ports reads 8 bytes, 1 port writes 16
bytes per cycle.Hardware prefetch
Prefetch CacheOne of L1 caches2 Kbyte SRAM : 32 x 64 bytesUses LRU replacement algorithmAim is to fetch data before neededReduces main memory access latency2 ports reads 8 bytes, 1 port writes 16
bytes per cycle.Hardware prefetch
![Page 25: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/25.jpg)
Cache OrganizationCache Organization
Write CacheReduces the bandwidth due to store
traffic2 Kbyte cacheHandles multiprocessor and on-chip
cache consistencyImproves error recoveryOptionally uses a hashed index
Write CacheReduces the bandwidth due to store
traffic2 Kbyte cacheHandles multiprocessor and on-chip
cache consistencyImproves error recoveryOptionally uses a hashed index
![Page 26: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/26.jpg)
Cache OrganizationCache Organization
L2 Cache16 MB SRAM shared by two processorsSeperate L2 cache tagsTwo way set associativeLRU replacement policy128 bytes of line size
UltraSparc IV+ has an on-die Level 2 cache with an off-die Level 3 cache
L2 Cache16 MB SRAM shared by two processorsSeperate L2 cache tagsTwo way set associativeLRU replacement policy128 bytes of line size
UltraSparc IV+ has an on-die Level 2 cache with an off-die Level 3 cache
![Page 27: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/27.jpg)
Branch PredictionBranch Prediction
Branch Predictor : Small, single-cycle accessedSRAMOutput is connected to P-stage
Branch detemination is made in B-stageIf miss, return to A-Stage.
Branch Predictor : Small, single-cycle accessedSRAMOutput is connected to P-stage
Branch detemination is made in B-stageIf miss, return to A-Stage.
![Page 28: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/28.jpg)
ConclusionConclusion
UltraSparc IV is a milestone as it is first dual core chip of UltraSparc family
Sun continues to develop UltraSparc :UltraSparc IV+UltraSparc T1
UltraSparc IV is a milestone as it is first dual core chip of UltraSparc family
Sun continues to develop UltraSparc :UltraSparc IV+UltraSparc T1
![Page 29: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/29.jpg)
ReferencesReferences
UltraSparc IV User’s Manual, Sun Microsystems
UltraSparc IV Whitepaper, Sun Microsystems
UltraSparc IV Mirrors Predecessor, Kevin Krewell
Implementation and Productization of a 4th Generation 1.8GHz Dual-Core SPARC V9 Microprocessor, Anand Dixit, Jason Hart, ...
UltraSparc III User’s Manual, Sun Microsystems
UltraSparc IV User’s Manual, Sun Microsystems
UltraSparc IV Whitepaper, Sun Microsystems
UltraSparc IV Mirrors Predecessor, Kevin Krewell
Implementation and Productization of a 4th Generation 1.8GHz Dual-Core SPARC V9 Microprocessor, Anand Dixit, Jason Hart, ...
UltraSparc III User’s Manual, Sun Microsystems
![Page 30: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/30.jpg)
ReferencesReferences
Web Sites :http://web.cs.unlv.edu/cs219/group3/
index.htmlhttp://bwrc.eecs.berkeley.edu/CIC/
archive/cpu_history.html#SPARChttp://www.arcade-eu.org/overview/2005/
sparcIV.htmlhttp://www.top500.org/orsc/2006/
sparcIV.htmhttp://www.sparc.org/history.html
Web Sites :http://web.cs.unlv.edu/cs219/group3/
index.htmlhttp://bwrc.eecs.berkeley.edu/CIC/
archive/cpu_history.html#SPARChttp://www.arcade-eu.org/overview/2005/
sparcIV.htmlhttp://www.top500.org/orsc/2006/
sparcIV.htmhttp://www.sparc.org/history.html
![Page 31: UltraSparc IV](https://reader035.vdocuments.mx/reader035/viewer/2022081417/568151e4550346895dc01df8/html5/thumbnails/31.jpg)
Questions...Questions...