2005 ©Erik F. Dirkx
Limits of
Parallel/Distributed Computing
Prof. Erik DIRKXVUB-INFO-PADS
[email protected]://parallel.vub.ac.be
2005 ©Erik F. Dirkx
Introduction• (Cluster)Computers : a tool for a new way
of doing science & engineering (cheap:BYO !!!)
• “Hardware”
• “Software”
2005 ©Erik F. Dirkx
Need for Speed• Processing :
signal : structured (e.g. MP3)dynamic : unstructured
• Data : pictures, movie, simulation, …
• Interconnect :
bandwidth ><latency
2005 ©Erik F. Dirkx
Scaleable computing : COW• Cluster of• Workstations (TX : RanchOW)
Fundamental Observation : (Erik’s law)
(><marginal production cost = 0) (remember : 20%+ of earth = Si …)
• Only general purpose programmable devices will survive in the long term yet …
“programmable” = ??
??*)cost(:
0)$(lim:cost
npriceprofit
chipofcopynthn
2005 ©Erik F. Dirkx
The original cluster (neo-cortex)• 10**11 general purpose neurons
=> compute & memory = “gray” matter
• 10**5 connections / neuron=> interconnect = “white” matter
• Switching time >1ms (digital PPM)• Input ~100 Mbps
(pre-thalamus)
• Output<<Input
storage : ~ 10**17 bits(do not drink & think …)
• ~20 W , Electro-Chemical,Carbo-Hydrate powered
2005 ©Erik F. Dirkx
General Purpose (neo)Cortex• General purpose
“cellular columns”(e.g. blind musician)
• 6 layer : 1 in,1out,4 compute
• 4 A4 pages constant density
• Tuned by “emotional”subsystem: real time, pre-emptive priorities
• Hierarchy root = “prefrontal cortex” (L=+, R=-)
2005 ©Erik F. Dirkx
Comparison• Human (general purpose)
speed - - [12km/h]
endurance - - [42.195 km]
power - - [200w@120km]
force - - [52*13 ???]
accuracy - -
(re?)-configurability +++
=> Learning (Software ?)
• Other predator (special purpose)
speed ++ (e.g. cheetah)
endurance ++ (e.g. orca)
power ++ (e.g. hyena)
force ++ (e.g. shark)
accuracy ++ (e.g. eagle)
++ @ price of general purposeness
=> Genetics (Hardware?)
2005 ©Erik F. Dirkx
Fundamental Bound (to Your enthousiasm ?) (physical)
technology
problem
)lg(**)$(
*),$(
2
1
max
1
nnkS
nkMP
SS
T
TSSpeedup
yet
n
0dn
dS0
dn
dS
0dn
dS
2005 ©Erik F. Dirkx
• Critical Parameter
informally
=> Hard <> Easy Problems
Granularity
eCommunicat
Compute
TT
GGray > (Compute cap.)
White < (Communication cap.)
2005 ©Erik F. Dirkx
Granularity (II)• Experience : situation, optimum :
too coarse => sub-optimal : !
too fine => comm bottleneck : !
• Tcomp = # instr * CPI * 1/f = Rproblem* Rmachine
• Tcomm = latency + #bits/bandwidth?=? Cproblem * Cmachine
• Cproblem = #databits
• Cmachine = …
2005 ©Erik F. Dirkx
Granularity (III)
• Cmachine =
• bandwidth ~ 1012 b/s => bw-1~1 ps
• latency 10Ghz = 0.1ns= 3 cm (vacuum);3mm (si)=> 1ps ~ 30m
bandwidthbits
latency 1
#
2005 ©Erik F. Dirkx
Granularity (IV)• Amdahl sections (i.e. bottlenecks, rest = “easy”
parallellism) : #bits ~ 1 !!
• ?? How to construct a “compiler”/computersystem with
dynamically tunable machine granularity to adapt to
dynamically varying demands on R and C from application(s)
• Structured ?! [ad hoc] / Unstructured ??
2005 ©Erik F. Dirkx
Fine Grain Parallellism
• FPGA implementation (NOT automatic)
• ATM switch sim @ faster than real time …
• Speed-up = traffic pattern dependent
2005 ©Erik F. Dirkx
Conclusion• Cluster computing is here to stay• Cluster computing is a vehicle for a new way of doing science &
engineering (for the masses)• COW is only one example of compute engines satisfying fundamental laws• (Digital) hardware : understood & economically sound• “Software” : cf. 1950’s
ad-hoc, need for language(s), theorethical support, run-time, fault tolerance, …
• VUB (INFO) : “Advanced Computer Architecture” + “Concurrente Systemen”(NL)/”Parallel Systems”(E)
• http://parallel.vub.ac.be
2005 ©Erik F. Dirkx
General Purpose “Computer”
• Amplifying elements
transistor (n*1000 atoms + quantum
mechanics)
• Connecting Elements
wire/fibre/wireless(Maxwell equations)
2005 ©Erik F. Dirkx
Cluster : BYO 101
• Step 1 : design Your “compute element” : e.g. look-up table, ALU, …
and build a BIG factory
• Step 2 : design Your “memory element” : e.g. a capacitor, MRAM, …
and build a BIG factory
• Step 3 : design Your “switch” : e.g. cross-bar
and build a BIG factory
2005 ©Erik F. Dirkx
Generic Multiprocessor
Processors/Memory
Interconnect (1)
Front-end Processors/Memory
Interconnect (2)
VUB/Internet
Interconnect 1 : High Bandwidth, Low Latency, DL-free (!!)Interconnect 2 : OTS TCP/IP
2005 ©Erik F. Dirkx
Cluster BYO (1+2)• Step 1 // Step 2
PC based COW :
B(uild) Y(our) O(wn)
Motherboards : BI- or QUAD CPU ?!
CPU, Memory : OTSDisk : RAID5
Hierarchical Control (remember pre-frontal cortex…)
Bottleneck !!!!
2005 ©Erik F. Dirkx
Cluster BYO (3)• Step 3
Buy a few km of wire/fibre
Buy switches=> compute switches
(PVM/MPI)=> diagnostic out-of-
band(TCP/IP)
=> KVM switches
2005 ©Erik F. Dirkx
VUB INFO : BYO (1995)• Design & Test
! Experience! Students
• Blue Gene (2004)256000 OTS CPUs4GB/CPU DRAM10 Gbps/node Power : ??MTBF : ??job run-time : months …
2005 ©Erik F. Dirkx
Other Examples
• Field Programmable Gate Array
Satisfies description, Fundamental Limits !“Program”/(Re)configure ??
• Hybrid : COW cluster + accelerator in each node e.g. Deep Blue : 32 * ( 1 + 8)
=> Variable Granularity …(someone interested in an interesting PhD topic??)
=>VUB / Erasmus hogeschool
2005 ©Erik F. Dirkx
Lookahead Accumulation inDiscrete Event Simulation
• Improve Gproblem through compile time aggregation
• A-synchronous synchronization system (!)
2005 ©Erik F. Dirkx
System Software• Compute => Sequential Languages
(?? Non-determinism, synchronization)• Storage => Virtual Memory, RAID, …• Communication => Communication Library
e.g. “Parallel Virtual Machine” : Opene.g. “Message Passing Interface” : Standard
• Fundamental Issue :
Parallel Operating System n*Linux + MPI …
(21st century Microsoft/Intel ?)
2005 ©Erik F. Dirkx
Application Software
• BYO : kursus “Parallel Systems”, VUB
• Public Domain Packages=> granularity !!
Numerical, well structurednon-Numerical, dynamic, ill structureddatabases (e.g. Google)
2005 ©Erik F. Dirkx
History• 1985 : B Army
mainframe + staff : <100 Kops, x00 MB,n*4800 bps
PC to “fine tune” + 1 temporary mil. service : >1Mips,20MB,10Mbps+ 1 EE/CS student in search for a PhD topic
• 1990 : “A Parallel Simulation Testbed for Computer Networks” : solved 0.1, posed 10 questions …
• 1992 : IBM T.J. Watson Vulcan/Deep Blue• 1993 : ETL, Tsukuba : Heterogeneous granularity • 1999 : Xilinx, San Jose : Reconfiguration
2005 ©Erik F. Dirkx
Vrije Universiteit Brussel : location
• Belgium : a EU experiment avant la lettre ??
Holland (A’dam) France (Paris)• ° [Alamo – 6]• 3 languages (NL,F,D)• 5 governements
(w/o county,city !)• NO supercomputer
• (Meta) stable ??• ~ Free education• 60 km coast / 10 M
people, 1 2L highway• No capital gains tx …• L&H … (Martha ??)• Airforce : F16 – ECM• CEC location & 1 of
the capitals …
2005 ©Erik F. Dirkx
2005 ©Erik F. Dirkx
Alternative
• Special purpose device
=> temporary=> point solution
??? $$$$ [design, debug, …]??? Dynamic environment
!? Power (cf. context)?! Patent (cf. EU software patent dispute …)
2005 ©Erik F. Dirkx
General Purpose Hardware• P(rocessor)
=> ALU (compute) + CU (control)
• M(emory)=> as much as possible=> as fast as possible
• S(witch)=> throughput (telecom !)=> latency (telecom ?)