sgi’2000parallel programming tutorial supercomputers 2 with the acknowledgement of igor zacharov...
Post on 15-Dec-2015
214 Views
Preview:
TRANSCRIPT
SGI’2000Parallel Programming Tutorial
Supercomputers 2
With the acknowledgement of
Igor Zacharov and Wolfgang Mertz
SGI European Headquarters
SGI’2000Parallel Programming Tutorial
MIMD
MultiprocessorsSingle Address spaceShared Memory
MulticomputersMultiple Address spaces
UMACentral Memory
NUMAdistributed memory
NORMAno-remote memory access
PVP (Cray T90)
SMP (Intel SHV, SUN E10000, DEC 8400SGI Power Challenge, IBM R60, etc.)
COMA (KSR-1, DDM)
CC-NUMA(SGI Origin2000, SN1 (SGI3000), Cray T3E, HP Exemplar, Sequent NUMA-Q, Data General)
NCC-NUMA (Cray T3D, IBM SP3)
Cluster (IBM SP2, DEC TruCluster,Microsoft Wolfpack, “Beowolf”, etc.)
loosely coupled, multiple OS
“MPP” (Intel TFLOPS,TM-5)
tightly coupled & single OSMIMD Multiple Instruction s Multiple Data PVP Parallel Vector ProcessorUMA Uniform Memory Access SMP Symmetric Multi-ProcessorNUMA Non-Uniform Memory Access COMA Cache Only Memory ArchitectureNORMA No-Remote Memory Access CC-NUMA Cache-Coherent NUMAMPP Massively Parallel Processor NCC-NUMA Non-Cache Coherent NUMA
Classification of Computers
SGI’2000Parallel Programming Tutorial
Processor
Cache
Processor
Cache
I/OI/OI/OI/OMain
MemoryMain
MemoryMain
MemoryMain
Memory
Processor
Cache
Central Bus
Structure of an SMP System (1)
• Does NOT scale due to Bus-saturation
• Bus is a very complex Component
• High Memory-Latency due to the Complexity
SGI’2000Parallel Programming Tutorial
Central Crossbar
Processor
Cache
Processor
Cache
I/OI/OI/OI/OMain
MemoryMain
MemoryMain
MemoryMain
Memory
Processor
Cache
Structure of an SMP System (2)
• Scales very well
• Crossbar is a very complex Component
• High Memory-Latency due to the Complexity
SGI’2000Parallel Programming Tutorial
^Nodeboard
I/O
Structure of an SMP System (3)Origin SGI NUMA Architecture
SGI NUMAhypercube
Global SwitchInterconnect N
N
R
R
R
R R
R
R
R
N
N
N
N
N
N
N
N
NN
N
N
N N
^Nodeboard
I/O
SGI’2000Parallel Programming Tutorial
Systems are built from Modules
Deskside(Module)
Rack(2 Modules)
Multi-rack(4 Modules)
Etc...
2-8 CPUs
16 CPUs
..128 CPUs
32 CPUs
SGI’2000Parallel Programming Tutorial
SGI Origin 3200SGI Onyx 3200
SGI Origin 3400SGI Onyx 3400
SGI Origin 3800SGI Onyx 3800
New High-End ProductsOrigin 3000 Servers – Onyx 3 Systems
IRIX 6.5
SGI’2000Parallel Programming Tutorial
SGI 3800 System (16-512p)
Minimum (16p) System 128p System
128P System Topology
R
Rack 1
C
CC
C
RC
CC
C
R
Rack 2
C
CC
C
R C
CC
C
R
Rack 3
C
CC
C
RC
CC
C
R
Rack 4
C
CC
C
R C
CC
C
1 2 3 4
Power Bay
Power Bay
I-Brick
C-Brick
C-Brick
C-Brick
Power Bay
R-Brick
C-Brick
R-Brick
C-Brick
C-Brick
C-Brick
Power Bay
C-Brick
C-Brick
C-Brick
C-Brick
Power Bay
R-Brick
C-Brick
R-Brick
C-Brick
C-Brick
C-Brick
Power Bay
C-Brick
C-Brick
C-Brick
C-Brick
Power Bay
R-Brick
C-Brick
R-Brick
C-Brick
C-Brick
C-Brick
Power Bay
C-Brick
C-Brick
C-Brick
C-Brick
Power Bay
R-Brick
C-Brick
R-Brick
C-Brick
C-Brick
C-Brick
Power Bay
C-Brick
Power Bay
Power Bay
I-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
Power Bay
Power Bay
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
Power Bay
Power Bay
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
Power Bay
Power Bay
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick
P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick
R-Brick8-port router
C-Brick
C-Brick
C-Brick
Power Bay
R-Brick
C-Brick
Power Bay
SGI’2000Parallel Programming Tutorial
ASCI Blue MountainLos Alamos National Laboratories
Origin 2000 with 3+ Tflops peak
1+ Tflop Application Performance
48 Systems with 128 CPUs each = 6144 CPUs
1536 Gbyte Memory
76 Tbyte Diskspace
SGI’2000Parallel Programming Tutorial
Spee
d of
Acc
ess
1/cl
ock
64reg
32KB(L1)
8MB(L2)
~1 - 100s GB
Cache subsystem memory
Device Capacity (size)
1
0.1
0.01
~4000 cy
~100 - 300 cy(NUMA)
~10 cy
~2-3cy
disk
Memory hierarchy
175 175235
285335 335
435485
585
343
554
759 759836
1067
1169
0
200
400
600
800
1000
1200
1400
2p 4p 8p 16p 32p 64p 128p 256p 512p
Rem
ote
Lat
ency
(n
s)
SN-MIPS Latency
Origin2000 Latency
top related