cmpe 4784 1 picture of tianhe, the most powerful computer in the world in nov-2010 cmpe 478 parallel...

72
CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

Upload: aniyah-gerrard

Post on 21-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 1

picture of Tianhe, the most powerful

computer in the world in Nov-2010

CMPE 478 Parallel Processing

Page 2: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 2

Von Neumann Architecture

CPU RAM Device Device

• sequential computer

BUS

Page 3: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 3

Memory Hierarchy

Registers

Cache

Real Memory

Disk

CD

Fast

Slow

Page 4: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 4

History of Computer Architecture

• 4 Generations (identified by logic technology)

1. Tubes

2. Transistors

3. Integrated Circuits

4. VLSI (very large scale integration)

Page 5: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 5

PERFORMANCE TRENDS

Page 6: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 6

PERFORMANCE TRENDS

• Traditional mainframe/supercomputer performance 25% increase per year

• But … microprocessor performance 50% increase per year since mid 80’s.

Page 7: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 7

Moore’s Law

• “Transistor density doubles every 18 months”

• Moore is co-founder of Intel.

• 60 % increase per year

• Exponential growth

• PC costs decline.

• PCs are building bricks of all future systems. Intel 62 core xeon Phi 2012 5 billion

Page 8: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 8

VLSI Generation

Page 9: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 9

Bit Level Parallelism(upto mid 80’s)

• 4 bit microprocessors replaced by 8 bit, 16 bit, 32 bit etc.

• doubling the width of the datapath reduces the number of cycles required to perform a full 32-bit operation

• mid 80’s reap benefits of this kind of parallelism (full 32-bit word operations combined with the use of caches)

Page 10: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 10

Instruction Level Parallelism(mid 80’s to mid 90’s)

• Basic steps in instruction processing (instruction decode, integer arithmetic, address calculations, could be performed in a single cycle)

• Pipelined instruction processing

• Reduced instruction set (RISC)

• Superscalar execution

• Branch prediction

Page 11: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 11

Thread/Process Level Parallelism(mid 90’s to present)

•On average control transfers occur roughly once in five instructions, so exploiting instruction level parallelism at a larger scale is not possible

•Use multiple independent “threads” or processes

•Concurrently running threads, processes

Page 12: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 12

Evolution of the Infrastructure

• Electronic Accounting Machine Era: 1930-1950

• General Purpose Mainframe and Minicomputer Era: 1959-Present

• Personal Computer Era: 1981 – Present

• Client/Server Era: 1983 – Present

• Enterprise Internet Computing Era: 1992- Present

Page 13: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 13

Sequential vs Parallel Processing

• physical limits reached

• easy to program

• expensive supercomputers

• “raw” power unlimited

• more memory, multiple cache

• made up of COTS, so cheap

• difficult to program

Page 14: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 14

What is Multi-Core Programming ?

• Answer: It is basically parallel programming on a single computer box (e.g. a desktop, a notebook, a blade)

Page 15: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 15

Another Important Benefit of Multi-Core : Reduced Energy Consumption

2 GHz 1 GHz 1 GHz

Single core Dual core

Energy per cycle(E ) = C*Vdd

Energy=E * N

Energy per cycle(E’ ) = C*(0.5*Vdd) = 0.25*C*Vdd Energy’ = 2*(E’ * 0.5 * N ) = E’ * N = 0.25*(E * N) = 0.25*Energy

c

c

c

c

c

c

2 2

2

Single core executesworkload of NClock cycles

Each core executesworkload of N/2Clock cycles

Page 16: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 16

SPMD Model (Single Program Multiple Data)

• Each processor executes the same program asynchronously

• Synchronization takes place only when processors need to exchange data

• SPMD is extension of SIMD (relax synchronized instruction execution)

• SPMD is restriction of MIMD (use only one source/object)

Page 17: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 17

Parallel Processing Terminology• Embarassingly Parallel:

-applications which are trivial to parallelize

-large amounts of independent computation

-Little communication

•Data Parallelism:

-model of parallel computing in which a single operation can be applied to all data elements simultaneously

-amenable to SIMD or SPMD style of computation

•Control Parallelism:

-many different operations may be executed concurrently

-require MIMD/SPMD style of computation

Page 18: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 18

Parallel Processing Terminology• Scalability:

- If the size of problem is increased, number of processors that can be effectively used can be increased (i.e. there is no limit on parallelism).

- Cost of scalable algorithm grows slowly as input size and the number of processors are increased.

- Data parallel algorithms are more scalable than control parallel alorithms

• Granularity:

- fine grain machines: employ massive number of weak processors each with small memory

- coarse grain machines: smaller number of powerful processors each with large amounts of memory

Page 19: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 19

Models of Parallel Computers

1. Message Passing Model

- Distributed memory

- Multicomputer

2. Shared Memory Model

- Multiprocessor

- Multi-core

3. Theoretical Model

- PRAM

• New architectures: combination of 1 and 2.

Page 20: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 20

Theoretical PRAM Model

• Used by parallel algorithm designers

• Algorithm designers do not want to worry about low level details: They want to concentrate on algorithmic details

• Extends classic RAM model

• Consist of :

– Control unit (common clock), synchronous

– Global shared memory

– Unbounded set of processors, each with its private own memory

Page 21: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 21

Theoretical PRAM Model

• Some characteristics

– Each processor has a unique identifier, mypid=0,1,2,…

– All processors operate synhronously under the control of a common clock

– In each unit of time, each procesor is allowed to execute an instruction or stay idle

Page 22: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 22

Various PRAM Models

weakest

strongest

EREW (exlusive read / exclusive write)

CREW (concurrent read / exclusive write)

CRCW (concurrent read / concurrent write)

Common (must write the same value)

Arbitrary (one processor is chosen arbitrarily)

Priority (processor with the lowest index writes)

(how write conflicts to the same memory location are handled)

Page 23: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 23

Flynn’s Taxonomy

• classifies computer architectures according to:

1.Number of instruction streams it can process at a time

2.Number of data elements on which it can operate simultaneously

Data Streams

Single Multiple

Single

Multiple

Instruction Streams

SISD SIMD

MIMDMISD

Page 24: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 24

Shared Memory Machines

Shared Address Space

process(thread)

process(thread)

process(thread)

process(thread)

process(thread)

•Memory is globally shared, therefore processes (threads) see single address space

•Coordination of accesses to locations done by use of locks provided by thread libraries

•Example Machines: Sequent, Alliant, SUN Ultra, Dual/Quad Board Pentium PC

•Example Thread Libraries: POSIX threads, Linux threads.

Page 25: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 25

Shared Memory Machines• can be classified as:

-UMA: uniform memory access

-NUMA: nonuniform memory access

based on the amount of time a processor takes to access local and global memory.

Inter-connectionnetwork/or BUS

Inter-connection

network

Inter-connection

network

P

P

..

P

M

M

..

M

P

M

P

M

..

P

M

P

M

P

M

..

P

M

M

M

M

..

M

(a)(b) (c)

Page 26: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 26

Distributed Memory Machines

Network

process

process

process

process

processM

M

M

M

M

•Each processor has its own local memory (not directly accessible by others)

•Processors communicate by passing messages to each other

•Example Machines: IBM SP2, Intel Paragon, COWs (cluster of workstations)

•Example Message Passing Libraries: PVM, MPI

Page 27: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 27

Beowulf Clusters

•Use COTS, ordinary PCs and networking equipment

•Has the best price/performance ratio

PC cluster

Page 28: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 28

Multi-Core Computing

• A multi-core microprocessor is one which combines two or more

independent processors into a single package, often a single integrated circuit.

• A dual-core device contains only two independent microprocessors.

Page 29: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 29

Comparison of Different Architectures

CPU State

CacheExecution

unit

Single Core Architecture

Page 30: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 30

Comparison of Different Architectures

CPU State

CacheExecution

unit

Multiprocessor

CPU State

CacheExecution

unit

Page 31: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 31

Comparison of Different Architectures

CPU State

CacheExecution

unit

Hyper-Threading Technology

CPU State

Page 32: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 32

Comparison of Different Architectures

CPU State

CacheExecution

unit

Multi-Core Architecture

CPU State

CacheExecution

unit

Page 33: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 33

Comparison of Different Architectures

CPU State

Executionunit

Multi-Core Architecture with Shared Cache

CPU State

Cache

Executionunit

Page 34: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 34

Comparison of Different Architectures

Multi-Core with Hyper-Threading Technology

CPU State

CacheExecution

unit

CPU State CPU State

CacheExecution

unit

CPU State

Page 35: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 35

Page 36: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 36

Top 500 Most Power Supercomputer Lists

• http://www.top500.org/

• ……..

Page 37: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 37

Grid Computing

• provide access to computing power and various resources just like accessing electrical power from electrical grid

• Allows coupling of geographically distributed resources

• Provide inexpensive access to resources irrespective of their physical location or access point

• Internet & dedicated networks can be used to interconnect distributed computational resources and present them as a single unified resource

• Resources: supercomputers, clusters, storage systems, data resources, special devices

Page 38: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 38

Grid Computing

• the GRID is, in effect, a set of software tools, which when combined with hardware, would let users tap processing power off the Internet as easily as the electrical power can be drawn from the electricty grid.

• Examples of Grids:

-TeraGrid (USA)

-EGEE Grid (Europe)

- TR-Grid (Turkey)

Page 39: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

GRID COMPUTING

Power Grid Compute Grid

Page 40: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 40

ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…

>250 sites48 countries>50,000 CPUs>20 PetaBytes>10,000 users>150 VOs>150,000 jobs/day

Page 41: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 41

Virtualization

• Virtualization is abstraction of computer resources. • Make a single physical resource such as a server, an operating system, an application, or storage device appear to function as

multiple logical resources• It may also mean making multiple physical resources such as storage devices or servers appear as a single logical resource • Server virtualization enables companies to run more than one operating system at the same time on a single machine

Page 42: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 42

Advantages of Virtualization

• Most servers run at just 10-15 %capacity – virtualization can increase server utilization to 70% or higher. • Higher utilization means fewer computers are required to process the same amount of work. Fewer machines means less

power consumption.• Legacy applications can also be run on older versions of an operating system• Other advantages: easier administration, fault tolerancy, security

Page 43: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 43

VMware Virtual Platform

Virtual machine 1

Apps 1

OS 1

X86, motherboarddisks, display, net ..

Virtual machine 2

Apps 2

OS 2

X86, motherboarddisks, display, net ..

VMware Virtual Platform

X86, motherboard, disks, display, net ..

Virtual machines

Real machines

•VMware is now tens of billion dollar company !!

Page 44: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 44

Cloud Computing

•Style of computing in which IT-related capabilities are provided “as a service”,allowing users to access technology-enabled services from the Internet ("in the cloud") without knowledge of, expertise with, or control over the technology infrastructure that supports them.

•General concept that incorporates software as a service (SaaS), Web 2.0 and other recent, well-known technology trends, in which the common theme is reliance on the Internet for satisfying the computing needs of the users.

Page 45: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Cloud Computing

• Virtualisation provides separation between infrastructure and user runtime environment

• Users specify virtual images as their deployment building blocks

• Pay-as-you-go allows users to use the service when they want and only pay for what they use

• Elasticity of the cloud allows users to start simple and explore more complex deployment over time

• Simple interface allows easy integration with existing systems

45

Page 46: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Cloud: Unique Features

• Ease of use

– REST and HTTP(S)

• Runtime environment

– Hardware virtualisation

– Gives users full control

• Elasticity

– Pay-as-you-go

– Cloud providers can buy hardware faster than you!

46

Page 47: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Example Cloud: Amazon Web Services

• EC2 (Elastic Computing Cloud) is the computing service of Amazon

– Based on hardware virtualisation

– Users request virtual machine instances, pointing to an image (public or private) stored in S3

– Users have full control over each instance (e.g. access as root, if required)

– Requests can be issued via SOAP and REST

47

Page 48: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Example Cloud: Amazon Web Services

• Pricing information

http://aws.amazon.com/ec2/

48

Page 49: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

PARALLEL PERFORMANCE MODELSand

ALGORITHMS

49

Page 50: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 50

Amdahl’s Law• The serial percentage of a program is fixed. So speed-up obtained by

employing parallel processing is bounded.

• Lead to pessimism in in the parallel processing community and prevented development of parallel machines for a long time.

Speedup = 1

s + 1-s

P

• In the limit:

Spedup = 1/s s

Page 51: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 51

Gustafson’s Law• Serial percentage is dependent on the number of

processors/input.

• Demonstrated achieving more than 1000 fold speedup using 1024 processors.

• Justified parallel processing

Page 52: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 52

Algorithmic Performance Parameters

• Notation

Input size

Time Complexity of the best sequential algorithm

Number of processors

Time complexity of the parallel algorithm when run on P processors

Time complexity of the parallel algorithm when run on 1 processors

Page 53: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 53

Algorithmic Performance Parameters

• Speed-Up

• Efficiency

Page 54: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 54

Algorithmic Performance Parameters

• Work = Processors X Time

– Informally: How much time a parallel algorithm will take to simulate on a serial machine

– Formally:

Page 55: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 55

Algorithmic Performance Parameters

• Work Efficient:

– Informally: a work efficient parallel algorithm does no more work than the best serial algorithm

– Formally: a work efficient algorithm satisfies:

Page 56: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 56

Algorithmic Performance Parameters

• Scalability:

– Informally, scalability implies that if the size of the problem is increased, the number of processors effectively used can be increased (i.e. there is no limit on parallelism)

– Formally, scalability means:

Page 57: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 57

Algorithmic Performance Parameters

• Some remarks:

– Cost of scalable algorithm grows slowly as input size and the number of procesors are increased

– Level of ‘control parallelism’ is usually a constant independent of problem size

– Level of ‘data parallelism’ is an increasing function of problem size

– Data parallel algorithms are more scalable than control parallel algorithms

Page 58: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784 58

Goals in Designing Parallel Algorithms

• Scalability:

– Algorithm cost grows slowly, preferably in a polylogarithmic manner

• Work Efficient:

– We do not want to waste CPU cycles

– May be an important point when we are worried about power consumption or ‘money’ paid for CPU usage

Page 59: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

•Array of N numbers can be summed in log(N) steps using

Summing N numbers in Parallel

x1 x2 x3 x4 x5 x6 x7 x8

x1+x2 x2 x3+x4 x4 x5+x6 x6 x7+x8 x8

x1+..+x4 x2 x3+x4 x4 x5+..+x8 x6 x7+x8 x8

x1+..+x8 x2 x3+x4 x4 x5+..+x8 x6 x7+x8 x8

step 1

step 2

step 3

result

N/2 processors

Page 60: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Prefix Summing N numbers in Parallel

x1 x2 x3 x4 x5 x6 x7 x8

x1+x2 x2+x3 x3+x4 x4+x5 x5+x6 x6+x7 x7+x8 x8

x1+..+x4 x2+..+x4 x3+..+x6 x4+..+x7 x5+..+x8 x6+..+x8 x7+x8 x8

x1+..+x8 x2+..+x8 x3+..+x8 x4+..+x8 x5+..+x8 x6+..+x8 x7+x8 x8

step 1

step 2

step 3

•Computing partial sums of an array of N numbers can be done in log(N) steps using N processors

Page 61: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Prefix Paradigm for Parallel Algorithm Design

•Prefix computation forms a paradigm for parallel algorithm development, just like other well known paradigms such as:

– divide and conquer, dynamic programming, etc.

•Prefix Paradigm: – If possible, transform your problem to prefix type computation– Apply the efficient logarithmic prefix computation

•Examples of Problems solved by Prefix Paradigm:

– Solving linear recurrence equations– Tridiagonal Solver– Problems on trees– Adaptive triangular mesh refinement

Page 62: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Solving Linear Recurrence Equations

• Given the linear recurrence equation:

• we can rewrite it as:

• if we expand it, we get the solution in terms of partial products of coefficients and the initial values z1 and z0 :

• use prefix to compute partial products

z a z b zi i i i i 1 2

z

z

a b z

zi

i

i i i

i

1

1

21 0

z

z

a b a b a b a b z

zi

i

i i i i i i

1

1 1 2 2 2 2 1

01 0 1 0 1 0 1 0...

Page 63: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Pointer Jumping Technique

•A linked list of N numbers can be prefix-summed in log(N)steps using N processors

step 1

step 3

x1 x2 x3 x4 x5 x6 x7 x8

x1+..+x4 x2+..+x5 x3+..+x6 x4+..+x7 x5+..+x8 x6+x7 x7+x8 x8

step 2

x1+.x2 x2+x3 x3+x4 x4+x5 x5+x6 x6+x7 x7+x8 x8

x1+..+x8 x2+..+x8 x3+..+x8 x4+..+x8 x5+..+x8 x6+..+x8 x7+x8 x8

Page 64: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Euler Tour Technique

b d

a

c

f ge

h i

Tree Problems:

•Preorder numbering•Postorder numbering•Number of Descendants•Level of each node

•To solve such problems, first transform the tree by linearizing it into a linked-list and then apply the prefix computation

Page 65: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Computing Level of Each Node by Euler Tour Technique

b d

a

c

f ge

h i

1

-1

1

-11

-1 1

1

-11

-1

-1

1 -11

-1 weight assignment:

1 -1

level(v) = pw(<v,parent(v)>)level(root) = 0

w(<u,v>)

pw(<u,v>)

igba d a c a g h g b f b e b a1-1 -1 1 -1 -1 -1 1 -1 1 1 -1 1 -1 1 1

1212123232101010

initial weights:

prefix:

Page 66: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Computing Number of Descendants by Euler Tour Technique

b d

a

c

f ge

h i

0

1

0

10

1 0

0

10

1

1

0 10

1 weight assignment:

0 1

# of descendants(v) = pw(<parent(v),v>) - pw(<v,parent(v)>) # of descendants(root) = n

w(<u,v>)

pw(<u,v>)

igba d a c a g h g b f b e b a01 1 0 1 1 1 0 1 0 0 1 0 1 0 0

0011222334566778

initial weights:

prefix:

Page 67: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Preorder Numbering by Euler Tour Technique

b d

a

c

f ge

h i

1

0

1

01

0 1

1

01

0

0

1 01

0 weight assignment:

1 0

preorder(v) = 1 + pw(<v,parent(v)>)preorder(root) = 1

w(<u,v>)

pw(<u,v>)

igba d a c a g h g b f b e b a10 0 1 0 0 0 1 0 1 1 0 1 0 1 1

1223345566667788

initial weights:

prefix:

1

2

34

5

6 7

8 9

Page 68: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Postorder Numbering by Euler Tour Technique

b d

a

c

f ge

h i

0

1

0

10

1 0

0

10

1

1

0 10

1 weight assignment:

0 1

postorder(v) = pw(<parent(v),v>)

postorder(root) = n

w(<u,v>)

pw(<u,v>)

igba d a c a g h g b f b e b a01 1 0 1 1 1 0 1 0 0 1 0 1 0 0

0011222334566778

initial weights:

prefix:

9

6

1 25

3 4

78

Page 69: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Binary Tree Traversal

• Preorder• Inorder• Postorder

Page 70: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Brent’s Theorem

• Given a parallel algorithm with computation time D, if parallel algorithm performs W operations then P processors can execute the algorithm in time D + (W-D)/P

For proof: consider DAG representation of computation

Page 71: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Work Efficiency

• Parallel Summation• Parallel Prefix Summation

Page 72: CMPE 4784 1 picture of Tianhe, the most powerful computer in the world in Nov-2010 CMPE 478 Parallel Processing

CMPE 4784

Work Efficiency

• Parallel Summation• Parallel Prefix Summation