11. multicore processors
DESCRIPTION
11. Multicore Processors. Dezső Sima Fall 2006. D. Sima, 2006. Overview. 1 Overview of MCPs. 2 Attaching L2 caches. 3 Attaching L 3 caches. 4 Connecting memory and I/O. 5 Case examples. 1. Overview of MCPs (1). Figure 1.1 : Processor power density trends. - PowerPoint PPT PresentationTRANSCRIPT
11. Multicore Processors
Dezső Sima
Fall 2006
D. Sima, 2006
1 Overview of MCPs•
2 Attaching L2 caches•
5 Case examples•
4 Connecting memory and I/O•
Overview
3 Attaching L3 caches•
1. Overview of MCPs (1)
Figure 1.1: Processor power density trends
Source: D. Yen: Chip Multithreading Processors Enable Reliable High Throughput Computing http://www.irps.org/05-43rd/IRPS_Keynote_Yen.pdf
1. Overview of MCPs (2)
Figure 1.2: Single-stream performance vs. cost
Source: Marr T.T. et al. „Hyper-Threading Technology Architecture and MicroarchitectureIntel Technology Journal, Vol. 06, Issue 01, Febr 14, 2002, pp. 4-16
1. Overview of MCPs (2)
Figure 1.2: Dual/multi-core processors (1)
1. Overview of MCPs (3)
Figure 1.3: Dual/multi-core processors (2)
Attaching of L2 caches
Layout of the cores
Layout of the I/O andmemory architecture
Macro architecture of dual/multi-core processors (MCPs)
Attaching of L3 caches (if available)
1. Overview of MCPs (4)
Inclusion policy
Allocation to the cores
Banking policy
Attaching L2 caches to MCPs
Use by instructions/data
Integration of L2 caches to the proc. chip
2. Attaching L2 caches
2.1 Main aspects of attaching L2 caches to MCPs (1)
Shared L2 cache for all cores
Allocation of L2 caches to the cores
Private L2 cache for each core
POWER4 (2001)
Montecito (2006?)
UltraSPARC IV (2004)
Smithfield (2005)
Athlon 64 X2 (2005)
POWER5 (2005)
Core Duo (2006)
Yonah (2006)UltraSPARC T1 (2005)
Expected trend
Inclusion policy
Allocation to the cores
Banking policy
Attaching L2 caches to MCPs
Use by instructions/data
Integration of L2 caches to the proc. chip
2.1 Main aspects of attaching L2 caches to MCPs (2)
Exclusive L2
Inclusion policy of L2 caches
Inclusive L2
L1
Memory
L2Memory
L2L1
Lines replaced (victimized) in the L1 arewritten into the L2
References to data in the L2 initiate reloadingthat cache line into the L1,
L2 operates usually as write back cache(only modified data that is replaced in the L2
Unmodified data that is replaced in the L2 is deleted.
is written back to the memory),
Figure 1.1: Implementation of exclusive L2 caches
Source: Zheng, Y., Davis, B.T., Jordan, M.: “ Performance evaluation of exclusive cache hierarchies”, 2004 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS),
2004, pp. 89-96.
Exclusive L2
Inclusion policy of L2 caches
Inclusive L2
Most implementations Athlon 64X2 (2005)
Expected trend
Inclusion policy
Allocation to the cores
Banking policy
Attaching L2 caches to MCPs
Use by instructions/data
Integration of L2 caches to the proc. chip
2.1 Main aspects of attaching L2 caches to MCPs (3)
Unified instr./data cache(s)
Use by instructions/data
Split instr./data caches
POWER4 (2001)
Montecito (2006?)UltraSPARC IV (2004)
Smithfield (2005)
Athlon 64 X2 (2005)
POWER5 (2005)
Core Duo (2006)
Yonah (2006)
UltraSPARC T1 (2005)
Expected trend
Inclusion policy
Allocation to the cores
Banking policy
Attaching L2 caches to MCPs
Use by instructions/data
Integration of L2 caches to the proc. chip
2.1 Main aspects of attaching L2 caches to MCPs (4)
Single-banked implementation
Banking policy
Multi-banked implementation
Inclusion policy
Allocation to the cores
Banking policy
Attaching L2 caches to MCPs
Use by instructions/data
Integration of L2 caches to the proc. chip
2.1 Main aspects of attaching L2 caches to MCPs (5)
On chip L2 tags/contr.,off chip data
Integration to the processor chip
Entire L2 on chip
POWER4 (2001)
UltraSPARC IV (2004)
Athlon 64 X2(2005)
POWER5 (2005)
Presler (2005)Smithfield (2005)
UltraSPARC V (2005)
Expected trend
Unified instruction / data caches Split instruction/data caches
Private L2 caches for each core
On-chip L2 tags/contr.,off-chip data
Entire L2 on-chip On-chip L2 t/coff-chip data
Entire L2 on-chip
Examples:
UltraSPARC IV (2004) Smithfield (2005)Presler (2005)
Montecito (2006?)
Core
L2
Core
L2
Syst. if.
FSB
Core
L2 I L2 D
L3
Core
L2 I L2 D
L3
Syst. if.
FSB
Core
Interconn. network
Mem. contr.
Memory
Syst. if.
Fire Planebus
Core
L2 data L2 data
L2 tags/contr. L2 tags/contr.
(Exclusive L2)
Athlon 64 X2 (2005)
Xbar
Memory
System Request Queue
HT-buscontr.
Memcontr.
HT-bus
L2 L2
2.2 Examples of attaching L2 caches to MCPs (1)
Core Core
Dual core/single banked L2 Dual core/multi banked L2
Shared L2 caches for all cores
2.2 Examples of attaching L2 caches to MCPs (2)
Multi core/multi banked L2
UltraSPARC T1 (2005) (Niagara)(8 cores/4xL2 banks)
POWER4 (2001)POWER5 (2005)
Yonah Duo (2006)Core (2006)Examples:
Core
X-bar
Core
L2 L2
Memory
Mem. contr.
Memory
Mem. contr.
The 128-byte long L2 cache lines are hashed acrossthe 3 modules. Hashing is performed by modulo 3arithmetric applied on a large number of real address bits.
The four L2 modules are interleaved at 64-byte blocks.
Mapping of addresses to the banks:
067
Addr.
0 21
Modulo 30
256
64
128
196
Mapping of addresses to the banks:
L2 contr.
Core
L2
Core
System if.
FSB
Core
X-bar
Core
L2 L2L2
Fabric Bu SContr.Fabric Bus Contr.
L3 tags/contr.
GXcontr.
GX bus
Attaching of L2 caches
Layout of the cores
Layout of the I/O andmemory architecture
Macro architecture of dual/multi-core processors (MCPs)
Attaching of L3 caches (if available)
3. Attaching L3 caches
Inclusion policy
Allocation to the L2 cache(s)
Banking policy
Attaching L3 caches to MCPs
Use by instructions/data
Integration of L3 caches to the proc. chip
3.1 Main aspects of attaching L3 caches to MCPs (1)
Shared L3 cache for all L2s
Allocation of L3 caches to the L2 caches
Private L3 cache for each L2
Montecito (2006?)
UltraSPARC IV+ (2004)
POWER5 (2005) POWER4 (2001)
Inclusion policy
Allocation to the L2 cache(s)
Banking policy
Attaching L3 caches to MCPs
Use by instructions/data
Integration of L3 caches to the proc. chip
3.1 Main aspects of attaching L3 caches to MCPs (2)
Exclusive L3
Inclusion policy of L3 caches
Inclusive L3
L2
Memory
L3Memory
L3L2
Lines replaced (victimized) in the L2 arewritten into the L3
References to data in the L3 initiate reloadingthat cache line into the L2,
L3 operates usually as write back cache(only modified data that is replaced in the L3
Unmodified data that is replaced in the L3 is deleted.
is written back to the memory),
Exclusive L3
Inclusion policy of L3 caches
Inclusive L3
Expected trend
Montecito (2006?)
UltraSPARC IV+ (2004)
POWER4 (2001) POWER5 (2005)
Inclusion policy
Allocation to the L2 cache(s)
Banking policy
Attaching L3 caches to MCPs
Use by instructions/data
Integration of L3 caches to the proc. chip
3.1 Main aspects of attaching L3 caches to MCPs (3)
Unified instr./data cache(s)
Use by instructions/data
Split instr./data caches
All multicore processorsunveiled until now hold
both instruction and data
Inclusion policy
Allocation to the L2 cache(s)
Banking policy
Attaching L3 caches to MCPs
Use by instructions/data
Integration of L3 caches to the proc. chip
3.1 Main aspects of attaching L3 caches to MCPs (4)
Single-banked implementation
Banking policy
Multi-banked implementation
Inclusion policy
Allocation to the L2 cache(s)
Banking policy
Attaching L3 caches to MCPs
Use by instructions/data
Integration of L3 caches to the proc. chip
3.1 Main aspects of attaching L3 caches to MCPs (5)
On chip L3 tags/contr.,off chip data
Integration to the processor chip
Entire L3 on chip
POWER4 (2001)
UltraSPARC IV+ (2005)
POWER5 (2005)Montecito (2006?)
Expected trend
Private L3 cachesfor each L2 cache banks
Shared L3 cachefor all cache banks
Inclusive L3 cache
On-chip L3 tags/contr.,off-chip data Entire L3 on-chip Entire L3 on-chip
Examples: POWER4 (2001)
3.2 Examples of attaching L3 caches to MCPs (1)
Montecito (2006?)
L2 I L2 D
L3
L2 I L2 D
L3
Arbiter
FSB
System if.
Fabric Bus Contr.
L2 L2 L2
Mem. contr.
Memory
L3 data
L3 tags/contr.
On-chip L3 tags/contr.,off-chip data
Private L3 cachesfor each L2 cache banks
Shared L3 cachefor all cache banks
Exclusive L3 cache
On-chip L3 tags/contr.,off-chip data Entire L3 on-chip Entire L3 on-chip
Examples:
3.2 Examples of attaching L3 caches to MCPs (2)
On-chip L3 tags/contr.,off-chip data
Core
L3 tags/contr.
L3 data
Interconn. network
Mem. contr.
Memory
Syst. if.
Fire Planebus
Core
L2
Fabric Bus Contr.
L2
L2
L2
Memory
Memory contr.
L3 tags/contr.
L3 tags/contr.
L3 tags/contr.
L3 data
L3 data
L3 data
POWER5 (2005): UltraSPARC IV+ (2005):
Attaching of L2 caches
Layout of the cores
Layout of the I/O andmemory architecture
Macro architecture of dual/multi-core processors (MCPs)
Attaching of L3 caches (if available)
4. Connecting memory and I/O
Connection policy of I/O and memory
Layout of the I/O and memory architecture in dual/multi-core processors
Integration of the memory controller to the processor chip
4.1 Overview
Connecting both I/O and memory via the system bus
Dedicated connection of I/O and memory
Connection policy of I/O and memory
Asymmetric connection of I/O and memory
Symmetric connection of I/O and memory
POWER4 (2001)
UltraSPARC IV (2004)
POWER5 (2005)
Montecito (2006?)
UltraSPARC T1 (2005)
UltraSPARC IV+ (2005)Presler (2005)
Smithfield (2005)
PA-8800 (2004)PA-8900 (2005)
Core (2006)Yonah Duo (2006)
Athlon64 X2 (2005)
4.2 Connection policy (1)
Yonah Duo/Core (2006/2006)Smithfield/Presler (2005/2005)
L2
FSB
Montecito (2006) PA-8800 (2004)
L2 L2
FSB
Examples:
Syst. bus if. Syst. bus if.
L2
CoreL2Core
Syst. bus if.
contr.
FSB
L3 L3
FSB
L2 I/L2 D
L2 I/L2 D
Syst. bus if.
PA-8900 (2005)
Connecting both I/O and memory via the system bus
4.2 Connection policy (2)
Connecting both I/O and memory via the system bus
Dedicated connection of I/O and memory
Connection policy of I/O and memory
Asymmetric connection of I/O and memory
Symmetric connection of I/O and memory
POWER4 (2001)
UltraSPARC IV (2004)
POWER5 (2005)
Montecito (2006?)
UltraSPARC T1 (2005)
UltraSPARC IV+ (2005)Presler (2005)
Smithfield (2005)
PA-8800 (2004)PA-8900 (2005)
Core (2006)Yonah Duo (2006)
Athlon64 X2 (2005)
(Connecting I/O via the internalinterconnection network,
and memory via the L2/L3 cache)
(Connecting both I/O and memory via the internal interconnection
network
4.2 Connection policy (3)
POWER4 (2001)UltraSPARC T1 (2005)
L2L2 M. contr.
Bus if.
L2
L2
L2Core 7
M. contr.
M. contr.
M. contr.
Core 0
X
b
a
r
Memory
Memory
Memory
Memory
Fabric Bus Contr.
Mem. contr.
Memory
GXcontr.
L3 dir./contr.
L3 data
L2 L2 L2
GX-bus
Chip-to-chip/Mem.-to-Mem.interconn.
JBus
Asymmetric connection of I/O and memory
4.2 Connection policy (4)
Connecting both I/O and memory via the system bus
Dedicated connection of I/O and memory
Connection policy of I/O and memory
Asymmetric connection of I/O and memory
Symmetric connection of I/O and memory
POWER4 (2001)
UltraSPARC IV (2004)
POWER5 (2005)
Montecito (2006?)
UltraSPARC T1 (2005)
UltraSPARC IV+ (2005)Presler (2005)
Smithfield (2005)
PA-8800 (2004)PA-8900 (2005)
Core (2006)Yonah Duo (2006)
Athlon64 X2 (2005)
(Connecting I/O via the internalinterconnection network,
and memory via the L2/L3 cache)
(Connecting both I/O and memory via the internal interconnection
network
4.2 Connection policy (5)
UltraSPARC IV (2004)POWER5 (2005)
Fabric Bus Contr.
L3
GXcontr.
Memcontr.
GX. bus Memory
L2 L2 L2
Chip-chip/Mem.-Mem.interconn.
Core
Interconn. network
Mem. contr.
Memory
Syst. if.
Fire Planebus
Core
L2 data L2 data
L2 tags/contr. L2 tags/contr.
Symmetric connection of I/O and memory (1)
4.2 Connection policy (6)
Athlon 64 X2 (2005)
Xbar
Memory
System Request Queue
HT-buscontr.
Memcontr.
HT-bus
L2 L2
UltraSPARC IV+ (2005)
Core
L3 tags/contr.
L3 data
Interconn. network
Mem. contr.
Memory
Syst. if.
Fire Planebus
Core
L2
Symmetric connection of I/O and memory (2)
4.2 Connection policy (7)
Off-chip memory controller On-chip memory controller
Integration of the memory controller to the processor chip
POWER4 (2001)
UltraSPARC IV+ (2005)
POWER5 (2005)
Montecito (2006?)
UltraSPARC T1 (2005)
UltraSPARC IV (2004)
Athlon 64 X2 (2005)Presler (2005)Smithfield (2005)
PA-8800 (2004)PA-8900 (2005)
Core (2006)Yonah Duo (2006)
Expected trend
4.3 Integration of the memory controller to the processor chip
5. Case examples
5.1 Intel MCPs (1)
The Move to Intel MultiThe Move to Intel Multi--corecore20052005 20062006 2007+2007+PlatformPlatform
ItaniumItanium®®processorprocessor
Desktop Desktop ClientClient
Mobile Mobile ClientClient
All products and dates are preliminary and subject to change without notice.
MP ServerMP Server
DP Server / DP Server / WSWS
Refer to ‘fact sheet’ for specific product timings
today
Figure 5.1: The move to Intel multi-core
Source: A. Loktu: Itanium 2 for Enterprise Computing http://h40132.www4.hp.com/upload/se/sv/Itanium2forenterprisecomputing.pps
5.1 Intel MCPs (2)
Figure 5.2: Processor specifications of Intel’s Pentium D family (90 nm)Source: http://www.intel.com/products/processor/index.htm
EIST: Enhanced Intel SpeedStep Technology
First delivered in Intel’s mobile and server platforms,It allows the system to dynamically adjust processor voltage and core frequency,which can result in decreased average power consumptionand decreased average heat production.
It is a set of hardware enhancements to Intel’s server and client platforms that can improve the performance and robustness of traditional software-based virtualization solutions.
Virtualization solutions will allow a platform to run multiple operating systems and applications in independent partitions. Using virtualization capabilities, one computer system can function as multiple "virtual" systems.
VT: Virtualization Technology
Malicious buffer overflow attacks pose a significant security threat. In a typical attack, a malicious worm creates a flood of code that overwhelms the processor,allowing the worm to propagate itself to the network, and to other computers. It can help prevent certain classes of malicious buffer overflow attacks when combined with a supporting operating system.
Execute Disable Bit allows the processor to classify areas in memory by where application code can execute and where it cannot. When a malicious worm attempts to insert code in the buffer, the processor disables code execution, preventing damage and worm propagation.
ED: Execute Disable Bit
5.1 Intel MCPs (3)
5.1 Intel MCPs (4)
Figure 5.3: Processor specifications of Intel’s Pentium D family (65 nm)Source: http://www.intel.com/products/processor/index.htm
5.1 Intel MCPs (5)
Figure 5.4 Specifications of Intel’s Pentium Processor Extrem Edition models 840/955/965
Source: http://www.intel.com/products/processor/index.htm
5.1 Intel MCPs (6)
Figure 5.5: Procesor specifications of Intel’s Yonah Duo (Core Duo) family
Source: http://www.intel.com/products/processor/index.htm
Source: http://www.intel.com/products/processor_number/chart/core2duo.htm
5.1 Intel MCPs (7)
Figure 5.6 Specifications of Intel’s Core Processors
5.1 Intel MCPs (8)
Category Code Name Cores Cache Market
Desktop KentsfieldDual core multi-die
4 MB Mid 2007
Desktop ConroeDual core single die
4 MB shared End 2006
Desktop AllendaleDual core single die
2 MB shared End 2006
Desktop Cedar Mill (NetBurst/P4) Single core512 kB, 1 MB, 2 MB
Early 2006
Desktop Presler (NetBurst/P4) Dual core, dual die 4 MB Early 2006
Desktop/Mobile Millville Single core 1 MB Early 2007
Mobile Yonah2 Dual core, single die 2 MB Early 2006
Mobile Yonah1 Single core 1/2 MB Mid 2006
Mobile Stealey Single core 512 kB Mid 2007
Mobile Merom Dual core, single die 2/4 MB shared End 2006
Enterprise Sossaman Dual core, single die 2 MB Early 2006
Enterprise Woodcrest Dual core, single die 4 MB Mid 2006
Enterprise Clovertown Quad core, multi-die 4 MB Mid 2007
Enterprise Dempsey (NetBurst/Xeon) Dual core, dual die 4 MB Mid 2006
Enterprise TulsaDual core single die
4/8/16 MB End 2006
Enterprise WhitefieldQuad core single die
8 MB, 16 MB shared Early 2008Figure 5.7: Future 65 nm processors (overview)
Source: P. Schmid: Top Secret Intel Processor Plans Uncovered www.tomshardware.com/2005/12/04/top_secret_intel_processor_plans_uncovered
Codename Cores Cache Market
Desktop Wolfdale Dual core, single die 3 MB shared 2008
Desktop RidgefieldDual core single die
6 MB shared 2008
Desktop Yorkfield8 cores multi-die
12 MB shared 2008+
Desktop Bloomfield Quad core, single die - 2008+
Desktop/Mobile
Perryville Single core 2 MB 2008
Mobile PenrynDual core single die
3 MB, 6 MB shared 2008
Mobile Silverthorne - - 2008+
Enterprise Hapertown8 cores multi-die
12 MB shared 2008
Figure 5.8: Future 45 nm processors (overview)
5.1 Intel MCPs (9)
Source: P. Schmid: Top Secret Intel Processor Plans Uncovered www.tomshardware.com/2005/12/04/top_secret_intel_processor_plans_uncovered
5.2 Athlon 64 X2
Figure 5.9: AMD Athlon 64 X2 dual-core processor architectureSource: AMD Athlon 64 X2 Dual-Core Processor for Desktop – Key Architecture Features, http:///www.amd.com/us-en/Processors/ProductInformation/0,,30_118_9485_13041.00.html
5.3 Sun’s UltraSPARC IV/IV+ (1)
Figure 5.10: UltraSPARC IV (Jaguar)
Source: C. Boussard: Architecture des processeurshttp://laser.igh.cnrs.fr/IMG/pdf/SUN-CNRS-archi-cpu-3.pdf
ARB: Arbiter
5.3 Sun’s UltraSPARC IV/IV+ (2)
Figure 5.11: UltraSPARC IV+ (Panther)
Source: C. Boussard: Architecture des processeurshttp://laser.igh.cnrs.fr/IMG/pdf/SUN-CNRS-archi-cpu-3.pdf
5.4 POWER4/POWER5 (1)
Figure 5.12: POWER4 chip logical view
Source: J.M. Tendler, S. Dodson, S. Fields, H. Le, B. Sinharoy: Power4 System Microarchitecture, IBM Server, Technical White Paper, October 2001
http://www-03.ibm.coom/servers/eserver/pseries/hardware/whitepapers/power4.pdf
Built-In-SelfTest
Service Processor
Power On Reset
Core interface Unit(crossbar)
Non-CacheableUnit
MultiChip Module
5.4 POWER4/POWER5 (2)
Figure 5.13: POWER4 chip
Source: R. Kalla, B. Sinharoy, J. Tendler: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, 2003
http://www.hotchips.org/archives/hc15/3_Tue/11.ibm.pdf
5.4 POWER4/POWER5 (3)
Figure 5.14: POWER4 and POWER5 system structures
Source: R. Kalla, B. Sinharoy, J.M. Tendler: IBM Power5 chip: A Dual-core multithreaded Processor, IEEE. Micro, Vol. 24, No.2, March-April 2004, pp. 40-47.
FabricController
5.5 Cell (1)
Figure 5.15: Cell (BE) microarchitecture
Source: IBM: „Cell Broadband Engine™ processor – based systems”, IBM corp. 2006
SPE: SynergisticProcessing Element
EIB: Element Interface Bus
MFC: Memory Flow Controller
PPE: Power Processing Element
AUC: Atomic Update Cache
5.5 Cell (2)
Figure 5.16: Cell SPE architecture
Source: Blachford N.: „Cell Architecture Explained Version 2”, http://www.blachford.info/computer/Cell/Cell1_v2.html
5.5 Cell (3)
Figure 5.17: Cell floorplan
Source: Blachford N.: „Cell Architecture Explained Version 2”, http://www.blachford.info/computer/Cell/Cell1_v2.html