1 trends and options in parallel computing © heiko schröder, 2003

Post on 20-Dec-2015

223 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Trends and options in parallel computing

© Heiko Schröder, 2003

2

need0.01

0.1

1

10

100

1000

1970 1975 1980 1985 1990 1995

Per

form

an

ce

(MIP

S)

Year

4004

8080 80286

80386

Pentium 2

Pentium

80486

trends

massively parallel computing

1024

10241024

1024

1024

1024

hybridcomputing

10 11 10

reconfigurable mesh

optical highway

limitations

3

Understand the “world”

Create a model

Simulate

4

low cost!

5

1010 Light years (1026 mtrs) away1010 Light years (1026 mtrs) away

• This is the limit of the known universe.

We can either see nothing beyond this or there is nothing beyond this.

6

100k Light years (1021 mtrs)100k Light years (1021 mtrs)

• Moving out from the plane of our own spiral galaxy, the Milky Way, we again encounter nothing.

• Any alien life existing at this distance would be unaware that, lost in the light arriving from Sol - our indistinguishable sun, we were witnessing the dawn of Homo-Sapiens.

7

1 Peta m (1015 mtrs)1 Peta m (1015 mtrs)

• The Sun (Sol) is now just another star in space. This distance represents the farthest reaches of the Comets of our Solar System.

• Comets have extremely eccentric orbits.

• Perhaps the most famous comet, Halley’s, has been continuously observed since records in 240BC, returning to the Sun and passing through the vicinity of the Earth’s orbit every 74 to 78 years.

8

1 Gm (109 mtrs) 1 Gm (109 mtrs)

• Beyond the Moon’s orbit and the Earth is now just a speck in a dark sky.

• The orbit of the Moon around the Earth is, of course,not visible but has been added to show its relative size.

• Automatic image analysis might find the meteor that is likely to hit our planet.

9

100,000 km (108 mtrs) 100,000 km (108 mtrs)

• The Earth from the Moon as first witnessed by the crew of Apollo 8 (Lovell, Borman and Anders) on appearing from the far side of the moon on their first lunar orbit, Dec 24 1968.

• Borman was so inspired by this view of “the Good Earth” from space that he read a sermon, to listening world back on earth, on Christmas Day.

10

10,000 km (107 mtrs) 10,000 km (107 mtrs)

• A satellite view of the continent of North America, showing Earth’s turbulent weather patterns and the prominent hurricane off the West Indies.

• Weather satellites on near, polar orbits provide regular coverage of the Earth,s weather system.

• Communications satellites in distant, geo-stationary orbits, provide a web of continuous cross-continental communications.

11

1,000 km (106 mtrs) 1,000 km (106 mtrs)

• Chicago,its environs and Lake Michegan, as pictured by the astronauts of Skylab, launched May 14, 1973.

• Apart from the the military use of space, global monitoring and reconnaissance have become a vital part of the technological age.

• Using the differing optical properties of water and vegetation, Earth’s vital resources can be surveyed from space.

12

100 km (105 mtrs) 100 km (105 mtrs)

• Metropolitan Chicago and Lake Michegan, an area covering 10,000 square kilometres, is visible only from extreme heights.

• Some Balloons and Military spy planes are capable of flying at such altitudes.

• Environmental studies based on satellite images, using supercomputing for simulation.

13

10 km (104 mtrs) 10 km (104 mtrs)

• From a passing aircraft, at 33,000ft, whole cities come into view.

• Downtown Chicago.

• The Lake Michigan the waterfront, piers and the city streets and clearly visible.

14

1 km (103 mtrs) 1 km (103 mtrs)

• 1 kilometre spans the football stadium and the marina.

• Current limits of remote sensing from satellites.

15

100 m (102 mtrs) 100 m (102 mtrs)

• A small recreational area.

• From a helicopter hovering over the picnic site, we span an area roughly the size of a running track.

16

10 m (101 mtrs) 10 m (101 mtrs)

• The Picnic Site.

• Visually, we are now moving away from our subject, in order that our field of view can span a region 10m by 10m.

• The field of view of our eyes is about 50o, although this really defines our sharp vision. Evolution has equipped our eyes to spot movement right out to the periphery of our vision, almost 180o.

17

1 metre1 metre

• One Meter is the Order of the average Human Torso.

• Click in the central box to zoom in.

• Click in the picture to zoom out.

18

10cm = 100 mm (10-1 mtrs) 10cm = 100 mm (10-1 mtrs)

• The width of an adult human hand.

19

1 cm = 10 mm (10-2 mtrs) 1 cm = 10 mm (10-2 mtrs)

• The reticulated pattern of our skin.

20

1 mm (10-3 mtrs) 1 mm (10-3 mtrs)

• The cell structure and folds of our skin.

• This is the view that we would get is we used a magnifying glass of power x10.

21

100 um (10-4 mtrs) 100 um (10-4 mtrs)

• The semi-translucent cells of our own skin, as seen through a microscope at 100x.

• At 1/10th of a millimeter (100microns), it is beyond the limit of human acuity (the resolving power of our eyes).

• At best we can resolve about 15 lines per mm at a distance of one meter.

22

10 nm (10-8 mtrs) 10 nm (10-8 mtrs)

• The inter-linked nucleotides form a polynucleotide thread. Two such threads are coiled around each other to form the DNA molecule, from which our Chromosomes are built.

• Magnification 1,000,000x.

• This is approaching the limit the electron microscopy.

• Genomics and protein folding are current major applications for supercomputing

23

1 nm (10-9 mtrs) 1 nm (10-9 mtrs)

• The molecular make-up of DNA. Numerous atoms can be seen, each group representing a differing nucleotide in an amino-acid.

• Magnification 10Mx.

• The ability of the Atomic Force Microscope to create three-dimensional micro-graphs with resolution down to the nanometer scale has made it an essential tool for imaging surfaces in applications ranging from semiconductor processing to cell biology.

24

100 pm (10-10 mtrs) 1 Å100 pm (10-10 mtrs) 1 Å

• The Carbon Atom and its surrounding electron cloud, the probability volume occupied by Carbon’s 6 electrons as defined by Heisenberg’s uncertainty principle.

• This was an elaboration of the understanding of Physical Chemistry first announced by Niels Bohr in 1913.

• Bohr was awarded the Nobel Prize for Physics in 1922.

25

1 pm (10-12 mtrs)1 pm (10-12 mtrs)

• The heart of the Carbon Atom - the Nucleus, is just visible.

• The view we have of the Nucleus of the atom is very stylised.

• No instrument exists to ‘see’ the nucleus. Experiments and theories indicate what it is made of.

• Particle Physics and the use of Accelerators (atom-smashers) have gradually revealed the sub atomic world.

26

666 or 10 666 or 10

• From the smallest known entity, at 10-15 meters, to the furthest reaches of the known Universe, at 1025 meters, there are no more than 40 orders of Magnitude (factors of 10).

• In 3-dimensions this means that the Universe is 10120 times bigger than the smallest known particle.

• So who could possibly need a computer that could handle numbers up to or beyond 9.999999999 x 10120

?

27

0.01

0.1

1

10

100

1000

1970 1975 1980 1985 1990 1995

Per

form

an

ce

(MIP

S)

Year

4004

808080286

80386

Pentium 2

Pentium

80486

Moore’s LawMoore’s Law

28

0,5 µ

0,25 µ

ScalingFaktor 2:

• 1/2 width • 1/2 hight • 1/2 switching time

8 x performance!

29

The end of Moore’s LawThe end of Moore’s Law

1960 1970 1980 1990 2000 2010 2020 2030

0,01

0,1

1

10

Size of minimal transistor

ca. 0,03

30

1 to 63

64 to 255

256 to 1023

1024 and more

0

50

100

150

200

250

300

350

400

450

May

-93

Nov-93

May

-94

Nov-94

May

-95

Nov-95

May

-96

Nov-96

May

-97

Nov-97

Number of Systems

Nov-98

May

-99

May

-00

Nov-00

Nov-01

Nov-02

31

Switzerland 5 8 Luxembourg 0 6Scandinavia 12 8 Australia 5 3New Zealand 1 1Mexico 1 4 Brazil 0 1Canada 6 9Korea 3 4 Taiwan 0 2China 0 2Singapore 0 1

Industry Research Academic MIN1998 180 180 98 172000 260 116 71 43

231 new computers entered/left the list within 6 months: May to November 2000

32

33

34

35

36

37

38

39

40

41

42

43

44

Pentium 2

Pentium

Pentium 4

45

aerospace engineering,

artificial intelligence and knowledge processing,

astrophysics,

atmospheric research and meteorological forecasting,

automotive design and production,

computational aerodynamics,

computer graphics and imaging, cryptographic analysis,

economic modeling,

implementation techniques and pragmatic software and architectural considerations,

integrated circuit design,

molecular biology,

motion-picture graphics,

nuclear fusion research,

performance studies,

petroleum reservoir engineering and hydrology simulations,

pharmaceutical research structural analysis and computer-aided design,

and theoretical and experimental physics

46

Count Share Rmax Rpeak Procs

N/A 240 48 % 192826 288895 129074

Telecomm 59 11.8 % 14235 21423 8508

Finance 29 5.8 % 7394 10943 4952

Automotive 28 5.6 % 6998 10827 4152

Weather and Climate Research 27 5.4 % 20455 32573 13464

Database 27 5.4 % 7021 11136 4712

Geophysics 23 4.6 % 7737 23153 18467

Energy 10 2 % 12766 21453 17692

Information Processing Service 9 1.8 % 2271 3477 1532

Aerospace 8 1.6 % 6836 11424 5688

Manufacturing 8 1.6 % 1701 2558 1248

Information Service 6 1.2 % 1299 1844 784

WWW 5 1 % 1431 2258 2192

Benchmarking 3 0.6 % 2351 2731 1152

Life Science 3 0.6 % 1083 1600 928

Electronics 3 0.6 % 677 1013 384

Weather Forecasting 2 0.4 % 3028 5316 1808

Defense 2 0.4 % 931 1457 656

Chemistry 2 0.4 % 462 790 926

Pharmaceutics 1 0.2 % 536 765 510

Consulting 1 0.2 % 213 336 96

Biology 1 0.2 % 205 614 512

Mechanics 1 0.2 % 199 230 16

Software 1 0.2 % 197 259 144

Transportation 1 0.2 % 196 282 128

Total 500 100 % 293058 457365 219725

47

Physical limitsPhysical limits

c=300 000 km/sec OPS -- 0.3 mm/OP1210

9101000 PEs with OPS --30cm/OP

massive parallelism

distributed memory

48

The internet:108 idle computerslets use them!

Limits through network speedLimits through network speed

10-9sec instruction cycle10-1sec signal runtime

49

Suitable problemsSuitable problems

• Parallelism

• No parallelism ?

• Pipelining

50

Amdahl’s LawAmdahl’s Law

seq par

1

seq par

speedup < 1/seq

51

PredictionsPredictions

•Massive parallel special purpose •Cost * computation time

•Ease of use: time to program•Time to develop hardware

• Parallel computers with standard components • Imbedded massively parallel systems

52

• Slowdown of sequential speedup (Moore)

Up!

Demand forparallel systems?

What kind of systems?

53

VLSIVLSI

Very

Large

Scale

Integration

• simple cells

• few types

• regular architecture

• short connections

mesh -- torus

54

Mesh/TorusMesh/Torus

Diameter ( ) bisection width ( )

nn

2D mesh

55

Architecture of Systola 1024

Interface processors

ISA

RAM NORTH

host computer bus

Controller

RAM WEST

program memory

M. KundeH.W. LangM. SchimmlerH. SchmeckH. Schröder

Special features of the ISA:•fast local communication•aggregate functions with constant period•fast integer arithmetic

56

C:=C+CW

C:=C+CN

sum

“don’t”

“don’t”

57

58

59

60

61

62

63

64

65

SUM(C )ij

aggregate functionsconstant period !

66

Areas of application for ISA:automatic optical quality control

real time signal processingcomputer graphics /visualization ?linear equationsCryptography --> Tele-medicine ?

Special features of the ISA:fast aggregate functions (sum, carry)fast local communicationno local memorytypical improvement over PC: Factor 20-30

67

Implementation of Backprojection

g( , )x y

t

g( cos sin , )x yt t t

Tomography

68

robot visionrobot vision

projectorCCD CCD

Scan in objects

Scan in bodies ?

Robot visionmedical applications

69

ISA: Image classificationISA: Image classification

70

Spiral (Rein Warmels)Spiral (Rein Warmels)

Wavelet transform

71

Change viewpoint

Change transparency

cNon-uniform

72

1 µ

Systola 1024, 50 MHz

0.09 µ

Systola 2003, 1 GHz

Next generation Systola,performance prediction:Factor 120 (scaling the area)Factor 20 (scaling the speed)Factor 6 (chip area)

Factor 14,400/chip

limit?0.03 µ

73

Disadvantages of the mesh:

large diameter!

low bisection width!

74

cluster1024

10241024

1024

1024

1024

Massive parallelSYSTOLA 1024

Hybrid computing

Topology?

75

reconfigurable meshreconfigurable mesh

reconfigurable mesh =mesh + interior connections

15 positions

low cost

diameter = 1

76

modulo 3 countermodulo 3 counter

10 11 10

*1 mod 3

Constant time on RM butlog n / log log n on CRCW-PRAM

Configurational computing!

77

Reconfigurable meshReconfigurable mesh

Special featuresSIMDconstant diameterfaster than PRAM ?Suitable applicationsrouting/sorting/load balancingsparse matrix multiplicationsegmentation / component labelingfeature extractionimage database ?

78

Optical HighwayOptical Highway

6

3

10

#

C

widthW

processorsP

CWP W=1; P=100W=32; P=32

C

All-to-all connection

79

Horizontal all-to-all

Verticalall-to-all

80

Features of optically connected meshesSIMD/SPMD/MIMD ?implement all major architecturesall-to-all communication in 2 stepsBulk synchronous processing (BSP)no latency hidingno pin-limitationApplicationscoarse grain parallel computing only?ray-tracing ????

81

1024

10241024

1024

1024

1024

3D-problems

Hybrid

PRAM equivalent?High bisection width bound

OH

2D-problems, local communication

ISA

Low cost

diameter-bound >bisection-width-bound

RM

Future ?

82

Content• Parallel computing (not distributed)• Supercomputing• Systolic arrays, embedded systems• Fault tolerant parallel systems• Standard architectures – standard control • Future architecture – future control ??• NP-hard problems• Develop the skills to design embedded systems

83

??

??

top related