multicore: commercial processors
DESCRIPTION
Multicore: Commercial Processors. Some Examples. Desktop and Server/Enterprise Space Intel AMD SUN Microsystems The Embedded Space: Freescale Semiconductor. Focus. The Chip Level Architecture What do we have on chip? The Core Architecture - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/1.jpg)
ECE 4100/6100 (1)
Multicore: Commercial Processors
![Page 2: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/2.jpg)
ECE 4100/6100 (2)
Some Examples• Desktop and Server/Enterprise Space
– Intel
– AMD
– SUN Microsystems
• The Embedded Space: Freescale Semiconductor
![Page 3: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/3.jpg)
ECE 4100/6100 (3)
Focus• The Chip Level Architecture
– What do we have on chip?
• The Core Architecture– Note the presence/absence/configuration of concepts
studied earlier in class– Rationalize the design decisions that led to the
preceding– What can/should we expect next?
• Building systems using multicore chips
![Page 4: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/4.jpg)
ECE 4100/6100 (4)
The Intel Core Duo Processor Series
![Page 5: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/5.jpg)
ECE 4100/6100 (5)
Intel Core Duo Homogeneous cores Bus based on chip interconnect Shared Memory Traditional I/O
Classic OOO: Reservation Stations, Issue ports, Schedulers…etc
Large, shared set associative, prefetch, etc.
Source: Intel Corp.
![Page 6: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/6.jpg)
ECE 4100/6100 (6)
Intel Core Duo: Vital Stats • 151 million transistors; Shared 2 MB L2 cache• Each core has a 12 stage pipeline (Yonah)• Low-power (less than 25 watts) Dual Core microprocessor• Supports Intel’s Vanderpool virtualization technology• EM64T (Intel x86-64 extensions) is not supported
– Desktop market – not severe due to lack of OS and software– Sossaman processor for servers, which is based on Yonah, also lacks EM64T-support severe disadvantage
• Communication between the L2 cache and both execution cores is handled by an arbitration bus unit– Eliminates cache coherency traffic over the FSB– Raises the core-to-L2 latency– The increase in clock frequency offsets the impact
• Core processors communicate with the system chipset over a 667 MT/s front side bus (FSB), up from 533 MT/s used by the fastest Pentium M.• Intel Core Solo uses the same two-core die as the Core Duo, but features only one active core
– Chips failing quality control can be sold – Core 2 Duo processors will also include the ability to disable one core to conserve power
![Page 7: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/7.jpg)
ECE 4100/6100 (7)
The Core™ micro-architecture
Source: Ars Technica
![Page 8: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/8.jpg)
ECE 4100/6100 (8)
The Core Execution core
Source: Ars Technica
![Page 9: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/9.jpg)
ECE 4100/6100 (9)
Intel Core Duo• High memory latency due to the lack of on-die
memory controller (further aggravated by system-chipset's use of DDR-II RAM)
• Main-memory transactions have to pass through the Northbridge of the chipset– Higher latency compared to the AMD's Turion platform. – Weakness shared by the entire line of Pentium
processors– L2-cache is quite effective at hiding main-memory
latency• Execution units
– Three 64-bit integer exec units • one CIU (complex) + two SIU (simple)
– Two FPUs– Poor Floating Point Unit (FPU) throughput
• Limited to little "performance per watt" in single threaded applications compared to its predecessor.
![Page 10: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/10.jpg)
ECE 4100/6100 (10)
Core 2 Duo and Core Duo
• Very similar architectures• Bump in the processor speed• Increase in Level 2 cache. (2MB to 4MB)• Both chips have a 65-nm process technology
architecture and support a 667 MHz front-side-bus (FSB).• 14 stage pipeline
Source: Intel Corp.
![Page 11: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/11.jpg)
ECE 4100/6100 (11)
Intel® CoreTM2 Duo ProcessorProcess Technology 65 nmNumber of Processor Cores 2L2 Cache Size (shared between 2 processor cores) Up to 4MBTransistor Gate Height / Gate Oxide Thickness (65 nm) 1.2 nm
Transistor Gate Length (for 65nm Process Technology) 35 nm
Line Width 65 nmNumber of Transistors 291 million
Processor Die Size 143 mm2
Average Power <1.1 Watt
![Page 12: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/12.jpg)
ECE 4100/6100 (12)
Intel Core 2 Duo
Source: Hard Core Hardware
![Page 13: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/13.jpg)
ECE 4100/6100 (13)
Wide Dynamic Execution
Source: Bit Tech
![Page 14: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/14.jpg)
ECE 4100/6100 (14)
Wide Dynamic Execution
Source: Bit Tech
![Page 15: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/15.jpg)
ECE 4100/6100 (15)
Wide Dynamic Execution• Pipe width of 4 execution units per chip (Pentium
M/Pentium 4 Netburst have 3)• Delivery of more instructions per clock cycle• Pipeline depth of 14 vs. 31 in Pentium Prescott 4
– Compromise between efficient execution of short instructions and long instructions
• Ops fusion– Less work for the processor pipeline to run– Micro-ops fusion
• fuse together repetitive instructions in x86 code – Macro-ops fusion
• works on the x86 instructions themselves, not just their micro derivatives.
• Instruction loads and micro-ops can be reduced by approximately 15% and 10%, respectively
![Page 16: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/16.jpg)
ECE 4100/6100 (16)
Intelligent Power Capability
Source: Bit Tech
![Page 17: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/17.jpg)
ECE 4100/6100 (17)
• SpeedStep technology – Dyamic clock speed reduction– Intel mobile processors include this already– Enhanced SpeedStep used in Core 2 Duo
• Controller that turns on sections of the processor as needed. One core can be shut down for single-threaded applications
• Power consumption decreased by enhancements to Intel's 65nm process node – use Low-K dielectrics and strained silicon– use low-leakage and "sleep" transistors
Intelligent Power Capability
![Page 18: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/18.jpg)
ECE 4100/6100 (18)
Advanced Smart Cache
Source: Bit Tech
![Page 19: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/19.jpg)
ECE 4100/6100 (19)
Advanced Smart Cache
• Both cores share data stored in the L2 cache via an arbitration bus unit embedded in the cache. – Dynamically allocates cache space between the two cores,
minimising bus traffic by allowing both cores to access one copy of data
• Does larger L2 cache matter?– Studies point out that improvements in execution time are low
from a 2MB to 4MB for most applications (2-4%)
Source: Bit Tech
![Page 20: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/20.jpg)
ECE 4100/6100 (20)
Smart Memory Access
Source: Bit Tech
![Page 21: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/21.jpg)
ECE 4100/6100 (21)
Smart Memory Access
• Improved prefetch units• Memory disambiguation
– Allows re-ordering instructions more efficiently
Source: Ars TechnicaExample fromhttp://arstechnica.com/articles/paedia/cpu/core.ars/8
Execution without memory disambiguation
Memory AliasingExecution with and without memory disambiguation
![Page 22: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/22.jpg)
ECE 4100/6100 (22)
Advanced Digital Media Boost
Source: Bit Tech
![Page 23: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/23.jpg)
ECE 4100/6100 (23)
Advanced Digital Media Boost• Streaming SIMD Extension (SSE)
instructions– SSE instructions are an extension of the
standard x86 instruction set. – Utilized in multimedia encoding, decoding,
image manipulation and encryption • SSE instructions are 128-bit.
– Up from 64-bits – Double the SSE performance over previous
generation
![Page 24: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/24.jpg)
ECE 4100/6100 (24)
Comparison of SSE to prior processors
Source: Ars Technica
![Page 25: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/25.jpg)
ECE 4100/6100 (25)
Intel Conroe Vs Presler
• What is the major difference?– Shared L2 versus separate caches
Conroe Presler
Source: Bit Tech
![Page 26: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/26.jpg)
ECE 4100/6100 (26)
Intel’s Roadmap for Multicore
Source: Adapted from Tom’s Hardware
2006 20082007
SC 1MBDC 2MB
DC 2/4MB shared
DC 3 MB/6 MB shared
(45nm)
2006 20082007
DC 2/4MB
DC 2/4MB shared
DC 4MB
DC 3MB /6MB shared (45nm)
2006 20082007
DC 2MBDC 4MB
DC 16MB
QC 4MB
QC 8/16MB shared
8C 12MB shared (45nm)
SC 512KB/ 1/ 2MB
8C 12MB shared (45nm)
Des
ktop
pro
cess
ors
Mob
ile p
roce
ssor
s
Ent
erpr
ise
pro
cess
ors
• Drivers are – Market segments– More cache– More cores
• 80 core processor prototype has been designed!
![Page 27: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/27.jpg)
ECE 4100/6100 (27)
Intel Chipset Example
Source: Extreme Tech
![Page 28: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/28.jpg)
ECE 4100/6100 (28)
References and Links• http://www.intel.com/products/processor/coreduo/• http://en.wikipedia.org/wiki/Intel_Core• http://www.hothardware.com/viewarticle.aspx?articleid=845&cid=1• http://www.bit-tech.net/hardware/2006/03/10/intel_core_microarchitecture/• http://www.bit-tech.net/hardware/2006/05/19/intel_core_duo_t2600_on_the_desktop• http://www.bit-tech.net/hardware/2006/07/14/intel_core_2_duo_processors/• http://www.hardcoreware.net/reviews/review-347-1.htm• http://www.trustedreviews.com/cpu-memory/review/2006/08/28/Intel-Core-2-Duo-Merom-Notebooks/p1• http://www.trustedreviews.com/cpu-memory/review/2006/07/14/Intel-Core-2-Duo-Conroe-E6400-E6600-E6700-X6800/p1• http://techreport.com/reviews/2006q2/core-duo/index.x?pg=1• http://arstechnica.com/articles/paedia/cpu/core.ars/1• http://www.anandtech.com/mobile/showdoc.aspx?i=2663&p=4• http://www.extremetech.com/article2/0,1697,1988794,00.asp• http://www.coreduoinfo.com/blog/about-intel-core-duo/• http://67.91.114.164/intel_c2d_info.htm• http://www.pcper.com/article.php?aid=272&type=expert
![Page 29: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/29.jpg)
ECE 4100/6100 (29)
AMD MultiCore Processors
![Page 30: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/30.jpg)
ECE 4100/6100 (30)
Dual Core AMD Opteron
Source: AMD
![Page 31: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/31.jpg)
ECE 4100/6100 (31)
AMD Multicore (Dualcore) Opteron
• Two AMD Opteron CPU cores on a single die– Each has 1MB L2 cache
• 90nm, ~205 million transistors– Approximately same die
size as 130nm single-core AMD Opteron processor
• 95 watt power envelope– fits into 90nm power
infrastructure• Introduced with “K8”
Revision E core in April 2005
Core 0
Northbridge
1-MB L2
Core 11-MB L2
Source: AMD
![Page 32: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/32.jpg)
ECE 4100/6100 (32)
Opteron Core Pipeline
Source: Chip Architect
![Page 33: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/33.jpg)
ECE 4100/6100 (33)
AMD Opteron Processor Core Architecture
AGUAGU
Int Decode & Rename
FADD FMISCFMUL44-entry
Load/StoreQueue
36-entry FP scheduler
FP Decode & Rename
ALU
AGU
ALU
MULT
ALU
Res Res Res
L1Icache64KB
L1Dcache64KB
Fetch BranchPrediction
Instruction Control Unit (72 entries)
Fastpath Microcode EngineScan/Align/Decode
µops
Source: The 3D shop
![Page 34: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/34.jpg)
ECE 4100/6100 (34)
Dual Core AMD Opteron• AMD64 technology
– Runs 32-bit applications and is 64-bit capable– Compatible with the x86 software infrastructure– Enables a single architecture across 32- and 64-bit
environments• Direct Connect Architecture
– NUMA system • Each processor shares its memory with other
processors in the system– Integrated Memory Controller on-die
• DDR2 DRAM memory controller offers memory BW up to 10.7 GB/s per processor
– HyperTransport• Point-to-point interconnect can be used to build a
mesh of multiple-processor Opteron systems• Scalable bandwidth interconnect between processors,
I/O subsystems, and other chipsets• 24.0 GB/s peak bandwidth per processor
![Page 35: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/35.jpg)
ECE 4100/6100 (35)
Dual Core AMD Opteron• Not a simple aggregation of K8 cores
– Integrated the cores for efficiency• Dual-core Opteron acts very much like a SMP
system• Compatible with existing single-threaded,
multi-threaded (hyperthreaded) software• MOESI coherency protocol (O – “Owns”)
– Updates through system request interface• SSE3 support with 10 new instructions. • Quad-core upgradeability• Hardware assisted AMD Virtualization• Optimized Power Management
![Page 36: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/36.jpg)
ECE 4100/6100 (36)
Dual Core AMD Opteron
Source: Elec Design
![Page 37: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/37.jpg)
ECE 4100/6100 (37)
AMD Opteron (SOI)
Source: Chip Architect
![Page 38: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/38.jpg)
ECE 4100/6100 (38)
AMD 64 bit Core• 1MB L2 Cache• Detailed discussion of the 64-bit
core architecture at:– http://chip-architect.com/news/2003_
09_21_Detailed_Architecture_of_AMDs_64bit_Core.html
![Page 39: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/39.jpg)
ECE 4100/6100 (39)
I/O HubUSB
PCI
PCI-E Bridge
PCI-E Bridge
PCI-E BridgeI/O HubI/O Hub
PCI-E BridgePCI-E BridgePCI-E Bridge
Memory Controller
Hub
CPU CPU
Multiprocessor Systems using AMD Opteron
SRQCrossbar
HTMem.Ctrlr
SRQCrossbar
HTMem.Ctrlr
CPU CPU CPU CPU 8 GB/S
8 GB/S 8 GB/S
8 GB/S
AMD64 Direct Connect Architecture
• Eliminates FSB bottleneck• HyperTransport™ Technology interconnect for
high bandwidth and low latency• Each CPU has its own memory• Each CPU can access the main memory of another
processor, transparent to the programmer Different from SMP
Legacy x86 Architecture• CPUs, Memory, I/O all share a bus• Major bottleneck to performance• Faster CPUs or more cores for performance• Symmetric Multiprocessing
Source: AMD
![Page 40: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/40.jpg)
ECE 4100/6100 (40)
Multiprocessor Systems using AMD Opteron
Source: XBitlabs
![Page 41: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/41.jpg)
ECE 4100/6100 (41)
Cache coherency
Source: Chip Architect
![Page 42: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/42.jpg)
ECE 4100/6100 (42)
AMD Athlon 64 X2
Source: AMD
![Page 43: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/43.jpg)
ECE 4100/6100 (43)
References and Links• http://techreport.com/reviews/2005q2/opteron-x75/index.x?pg=1• http://www.tomshardware.com/2005/06/03/dual_core_stress_test/index.html• http://www.a1-electronics.net/AMD_Section/CPUs/2005/AMD_Athlon64x2_Apr.shtml• http://en.wikipedia.org/wiki/Opteron• http://en.wikipedia.org/wiki/Athlon_64_X2• http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_8796_14309,00.html• http://chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html• http://firingsquad.com/hardware/amd_dual-core_opteron_875/page2.asp• http://www.xbitlabs.com/articles/cpu/display/opteron-ws_4.html• http://www.extremetech.com/article2/0,1697,1675784,00.asp• http://www.elecdesign.com/Articles/Index.cfm?AD=1&ArticleID=11991• http://www.the3dshop.com/userimages/amd_systems/opteron_dualcore.htm• http://www.nextcomputing.com/advantages/thruadv.shtml • http://arstechnica.com/news.ars/post/20060817-7535.html • http://www.bit-tech.net/hardware/2005/05/09/amd_a64x2_4800/1.html
![Page 44: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/44.jpg)
ECE 4100/6100 (44)
SUN – UltraSPARC Multicore
![Page 45: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/45.jpg)
ECE 4100/6100 (45)
SUN – UltraSPARC T1• Eight cores, each 4-way
threaded• 1.2 GHz• Cache
– 16K 4-way 32B L1-I– 8K 4-way 16B L1-D– 3MB internal L2 cache
partitioned into four banks and four memory controllers.
– Data moved between the L2 and the cores using an integrated crossbar switch to provide high throughput
Source: Sun
![Page 46: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/46.jpg)
ECE 4100/6100 (46)
SUN – UltraSPARC T1
Source: Sun
![Page 47: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/47.jpg)
ECE 4100/6100 (47)
SUN – UltraSPARC T1 Pipeline
• T1's integer pipeline– Fetch, Thread Selection, Decode, Execute, Memory Access, Writeback
Source: Sun
![Page 48: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/48.jpg)
ECE 4100/6100 (48)
SUN UltraSPARC T2 – Niagara 2
Source: Sun
![Page 49: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/49.jpg)
ECE 4100/6100 (49)
SUN UltraSPARC T2• Ultra SPARC T2 has 8 threads/core (8 Sparc Cores)• 8 stage integer pipeline ( as opposed to 6 for T1)• Twice the performance of T1 with a transactional
workload (under the same power envelope)• Each thread, increased to 1.4 GHz from 1.2 GHz• One PCI Express port (x8 1.0)• Two 10 Gigabit Ethernet ports with packet classification
and filtering• L2 cache size increased to 4 MB shared (8-banks, 16-way
associative)• 1 floating point unit per core• Eight encryption engines • Four dual-channel FBDIMM memory controllers• 711 signal I/O,1831 total
![Page 50: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/50.jpg)
ECE 4100/6100 (50)
UltraSparc T2 Core Microarchitecture
Source: Realworld Tech
![Page 51: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/51.jpg)
ECE 4100/6100 (51)
UltraSparc T2 Memory System
Source: Sun
![Page 52: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/52.jpg)
ECE 4100/6100 (52)
UltraSparc T2 Core Block Diagram
• IFU – Instruction Fetch Unit– 16 KB I$, 32B lines, 8-way SA– 64-entry fully-associative ITLB
• EXU0/1 – Integer Execution Units– 4 threads share each unit– Executes one integer instrn/cycle
• LSU – Load/Store Unit– 8KB D$, 16B lines, 4-way SA 128-
entry fully-associative– DTLB
• FGU – Floating/Graphics Unit• SPU – Stream Processing Unit
– Cryptographic acceleration• TLU – Trap Logic Unit
– Updates machine state, handles exceptions and interrupts
• MMU – Memory Management Unit– Hardware tablewalk (HWTW)– 8KB, 64KB, 4MB, 256MB pages
Source: Sun
![Page 53: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/53.jpg)
ECE 4100/6100 (53)
UltraSparc T2 Core Pipeline• 8 stages for integer operations:
– Fetch, Cache, Pick, Decode, Execute, Memory, Bypass, Writeback
– > 3-cycle load-use– Memory (translation, tag/data access)– Bypass (late select, formatting)
• 12 stages for floating-point:– Fetch, Cache, Pick, Decode, Execute, FX1,
FX2, FX3, FX4, FX5, FB, FW– 6-cycle latency for dependent FP ops– Longer pipeline for divide/sqrt
![Page 54: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/54.jpg)
ECE 4100/6100 (54)
References and Links• http://realworldtech.com/page.cfm?
ArticleID=RWT090406012516&p=4• http://www.opensparc.net/cgi-bin/
goto.php?w=/pubs/preszo/06/HotChips06_09_ppt_master.pdf
• http://www.freescale.com/files/netcomm/doc/fact_sheet/MPC8572FS.pdf
![Page 55: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/55.jpg)
ECE 4100/6100 (55)
The Embedded Multicores
![Page 56: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/56.jpg)
ECE 4100/6100 (56)
Freescale MPC8572 PowerQUICC III Processor
Source: Freescale
![Page 57: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/57.jpg)
ECE 4100/6100 (57)
Freescale MPC8572 PowerQUICC III Processor
• Dual Embedded e500 core 36-bit physical addressing
• Double-precision floating-point• Integrated L1/L2 cache
– L1 cache—32 KB data and 32 KB– Shared L2 cache—1 MB with ECC– L2 configurable as SRAM, cache and I/O transactions
can be stashed into L2 cache regions• Integrated DDR memory controller with• full ECC support• Integrated security engine, Pattern Matching
Engine, Packet Deflate Engine• Four on-chip triple-speed Ethernet controllers
![Page 58: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/58.jpg)
ECE 4100/6100 (58)
References and Links• http://www.freescale.com/files/
netcomm/doc/fact_sheet/MPC8572FS.pdf
![Page 59: Multicore: Commercial Processors](https://reader036.vdocuments.mx/reader036/viewer/2022062521/5681680f550346895ddd9a70/html5/thumbnails/59.jpg)
ECE 4100/6100 (59)
Summary• Multicore technology spans the product spectrum
– The downward migration of leading edge technology continues
• Architectural principles are key to – Developers: extracting performance– Designers: improving performance– Marketing: understanding new markets for performance
• Research spans the spectrum of software, security, reliability, parallelelism, virtualization and much more!