ece 1747h : parallel programming lecture 1: overview

55
ECE 1747H : Parallel Programming Lecture 1: Overview

Upload: valerie-evans

Post on 13-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ECE 1747H : Parallel Programming Lecture 1: Overview

ECE 1747H : Parallel Programming

Lecture 1: Overview

Page 2: ECE 1747H : Parallel Programming Lecture 1: Overview

ECE 1747H

• Meeting time: Mon 4-6 PM

• Meeting place: BA 4164

• Instructor: Cristiana Amza,

http://www.eecg.toronto.edu/~amza

[email protected], office Pratt 484E

Page 3: ECE 1747H : Parallel Programming Lecture 1: Overview

Material

• Course notes

• Web material (e.g., published papers)

• No required textbook, some recommended

Page 4: ECE 1747H : Parallel Programming Lecture 1: Overview

Prerequisites

• Programming in C or C++

• Data structures

• Basics of machine architecture

• Basics of network programming

• Please send e-mail to eugenia@eecg

to get an eecg account !! (name, stuid, class, instructor)

Page 5: ECE 1747H : Parallel Programming Lecture 1: Overview

Other than that

• No written homeworks, no exams

• 10% for each small programming assignments (expect 1-2)

• 10% class participation

• Rest comes from major course project

Page 6: ECE 1747H : Parallel Programming Lecture 1: Overview

Programming Project

• Parallelizing a sequential program, or improving the performance or the functionality of a parallel program

• Project proposal and final report

• In-class project proposal and final report presentation

• “Sample” project presentation posted

Page 7: ECE 1747H : Parallel Programming Lecture 1: Overview

Parallelism (1 of 2)

• Ability to execute different parts of a single program concurrently on different machines

• Goal: shorter running time

• Grain of parallelism: how big are the parts?

• Can be instruction, statement, procedure, …

• Will mainly focus on relative coarse grain

Page 8: ECE 1747H : Parallel Programming Lecture 1: Overview

Parallelism (2 of 2)

• Coarse-grain parallelism mainly applicable to long-running, scientific programs

• Examples: weather prediction, prime number factorization, simulations, …

Page 9: ECE 1747H : Parallel Programming Lecture 1: Overview

Lecture material (1 of 4)

• Parallelism– What is parallelism?– What can be parallelized?– Inhibitors of parallelism: dependences

Page 10: ECE 1747H : Parallel Programming Lecture 1: Overview

Lecture material (2 of 4)

• Standard models of parallelism– shared memory (Pthreads)– message passing (MPI)– shared memory + data parallelism (OpenMP)

• Classes of applications– scientific– servers

Page 11: ECE 1747H : Parallel Programming Lecture 1: Overview

Lecture material (3 of 4)

• Transaction processing – classic programming model for databases– now being proposed for scientific programs

Page 12: ECE 1747H : Parallel Programming Lecture 1: Overview

Lecture material (4 of 4)

• Perf. of parallel & distributed programs– architecture-independent optimization– architecture-dependent optimization

Page 13: ECE 1747H : Parallel Programming Lecture 1: Overview

Course Organization

• First month of semester:– lectures on parallelism, patterns, models– small programming assignments, done

individually

• Rest of the semester– major programming project, done individually

or in small group– Research paper discussions

Page 14: ECE 1747H : Parallel Programming Lecture 1: Overview

Parallel vs. Distributed Programming

Parallel programming has matured:

• Few standard programming models

• Few common machine architectures

• Portability between models and architectures

Page 15: ECE 1747H : Parallel Programming Lecture 1: Overview

Bottom Line

• Programmer can now focus on program and use suitable programming model

• Reasonable hope of portability

• Problem: much performance optimization is still platform-dependent– Performance portability is a problem

Page 16: ECE 1747H : Parallel Programming Lecture 1: Overview

ECE 1747H: Parallel Programming

Lecture 1-2: Parallelism, Dependences

Page 17: ECE 1747H : Parallel Programming Lecture 1: Overview

Parallelism

• Ability to execute different parts of a program concurrently on different machines

• Goal: shorten execution time

Page 18: ECE 1747H : Parallel Programming Lecture 1: Overview

Measures of Performance

• To computer scientists: speedup, execution time.

• To applications people: size of problem, accuracy of solution, etc.

Page 19: ECE 1747H : Parallel Programming Lecture 1: Overview

Speedup of Algorithm

• Speedup of algorithm = sequential execution time / execution time on p processors (with the same data set).

p

speedup

Page 20: ECE 1747H : Parallel Programming Lecture 1: Overview

Speedup on Problem

• Speedup on problem = sequential execution time of best known sequential algorithm / execution time on p processors.

• A more honest measure of performance.

• Avoids picking an easily parallelizable algorithm with poor sequential execution time.

Page 21: ECE 1747H : Parallel Programming Lecture 1: Overview

What Speedups Can You Get?

• Linear speedup– Confusing term: implicitly means a 1-to-1

speedup per processor.– (almost always) as good as you can do.

• Sub-linear speedup: more normal due to overhead of startup, synchronization, communication, etc.

Page 22: ECE 1747H : Parallel Programming Lecture 1: Overview

Speedup

p

speeduplinear

actual

Page 23: ECE 1747H : Parallel Programming Lecture 1: Overview

Scalability

• No really precise decision.

• Roughly speaking, a program is said to scale to a certain number of processors p, if going from p-1 to p processors results in some acceptable improvement in speedup (for instance, an increase of 0.5).

Page 24: ECE 1747H : Parallel Programming Lecture 1: Overview

Super-linear Speedup?

• Due to cache/memory effects: – Subparts fit into cache/memory of each node.– Whole problem does not fit in cache/memory of

a single node.

• Nondeterminism in search problems.– One thread finds near-optimal solution very

quickly => leads to drastic pruning of search space.

Page 25: ECE 1747H : Parallel Programming Lecture 1: Overview

Cardinal Performance Rule

• Don’t leave (too) much of your code sequential!

Page 26: ECE 1747H : Parallel Programming Lecture 1: Overview

Amdahl’s Law

• If 1/s of the program is sequential, then you can never get a speedup better than s.– (Normalized) sequential execution time =

1/s + (1- 1/s) = 1– Best parallel execution time on p processors =

1/s + (1 - 1/s) /p– When p goes to infinity, parallel execution =

1/s– Speedup = s.

Page 27: ECE 1747H : Parallel Programming Lecture 1: Overview

Why keep something sequential?

• Some parts of the program are not parallelizable (because of dependences)

• Some parts may be parallelizable, but the overhead dwarfs the increased speedup.

Page 28: ECE 1747H : Parallel Programming Lecture 1: Overview

When can two statements execute in parallel?

• On one processor:statement 1;

statement 2;

• On two processors:processor1: processor2:

statement1; statement2;

Page 29: ECE 1747H : Parallel Programming Lecture 1: Overview

Fundamental Assumption

• Processors execute independently: no control over order of execution between processors

Page 30: ECE 1747H : Parallel Programming Lecture 1: Overview

When can 2 statements execute in parallel?

• Possibility 1Processor1: Processor2:

statement1;

statement2;

• Possibility 2Processor1: Processor2:

statement2:

statement1;

Page 31: ECE 1747H : Parallel Programming Lecture 1: Overview

When can 2 statements execute in parallel?

• Their order of execution must not matter!

• In other words,statement1; statement2;

must be equivalent tostatement2; statement1;

Page 32: ECE 1747H : Parallel Programming Lecture 1: Overview

Example 1

a = 1;b = 2;

• Statements can be executed in parallel.

Page 33: ECE 1747H : Parallel Programming Lecture 1: Overview

Example 2

a = 1;b = a;

• Statements cannot be executed in parallel

• Program modifications may make it possible.

Page 34: ECE 1747H : Parallel Programming Lecture 1: Overview

Example 3

a = f(x);b = a;

• May not be wise to change the program (sequential execution would take longer).

Page 35: ECE 1747H : Parallel Programming Lecture 1: Overview

Example 5

a = 1;a = 2;

• Statements cannot be executed in parallel.

Page 36: ECE 1747H : Parallel Programming Lecture 1: Overview

True dependence

Statements S1, S2

S2 has a true dependence on S1

iff

S2 reads a value written by S1

Page 37: ECE 1747H : Parallel Programming Lecture 1: Overview

Anti-dependence

Statements S1, S2.

S2 has an anti-dependence on S1

iff

S2 writes a value read by S1.

Page 38: ECE 1747H : Parallel Programming Lecture 1: Overview

Output Dependence

Statements S1, S2.

S2 has an output dependence on S1

iff

S2 writes a variable written by S1.

Page 39: ECE 1747H : Parallel Programming Lecture 1: Overview

When can 2 statements execute in parallel?

S1 and S2 can execute in parallel

iff

there are no dependences between S1 and S2– true dependences– anti-dependences– output dependences

Some dependences can be removed.

Page 40: ECE 1747H : Parallel Programming Lecture 1: Overview

Example 6

• Most parallelism occurs in loops.

for(i=0; i<100; i++) a[i] = i;

• No dependences.• Iterations can be executed in parallel.

Page 41: ECE 1747H : Parallel Programming Lecture 1: Overview

Example 7

for(i=0; i<100; i++) { a[i] = i; b[i] = 2*i;}

Iterations and statements can be executed in parallel.

Page 42: ECE 1747H : Parallel Programming Lecture 1: Overview

Example 8

for(i=0;i<100;i++) a[i] = i;for(i=0;i<100;i++) b[i] = 2*i;

Iterations and loops can be executed in parallel.

Page 43: ECE 1747H : Parallel Programming Lecture 1: Overview

Example 9

for(i=0; i<100; i++)

a[i] = a[i] + 100;

• There is a dependence … on itself!

• Loop is still parallelizable.

Page 44: ECE 1747H : Parallel Programming Lecture 1: Overview

Example 10

for( i=0; i<100; i++ )

a[i] = f(a[i-1]);

• Dependence between a[i] and a[i-1].

• Loop iterations are not parallelizable.

Page 45: ECE 1747H : Parallel Programming Lecture 1: Overview

Loop-carried dependence

• A loop carried dependence is a dependence that is present only if the statements are part of the execution of a loop.

• Otherwise, we call it a loop-independent dependence.

• Loop-carried dependences prevent loop iteration parallelization.

Page 46: ECE 1747H : Parallel Programming Lecture 1: Overview

Example 11

for(i=0; i<100; i++ )

for(j=0; j<100; j++ )

a[i][j] = f(a[i][j-1]);

• Loop-independent dependence on i.• Loop-carried dependence on j.• Outer loop can be parallelized, inner loop cannot.

Page 47: ECE 1747H : Parallel Programming Lecture 1: Overview

Example 12

for( j=0; j<100; j++ )

for( i=0; i<100; i++ )

a[i][j] = f(a[i][j-1]);

• Inner loop can be parallelized, outer loop cannot.

• Less desirable situation.

• Loop interchange is sometimes possible.

Page 48: ECE 1747H : Parallel Programming Lecture 1: Overview

Level of loop-carried dependence

• Is the nesting depth of the loop that carries the dependence.

• Indicates which loops can be parallelized.

Page 49: ECE 1747H : Parallel Programming Lecture 1: Overview

Be careful … Example 13

printf(“a”);printf(“b”);

Statements have a hidden output dependence due to the output stream.

Page 50: ECE 1747H : Parallel Programming Lecture 1: Overview

Be careful … Example 14

a = f(x);b = g(x);

Statements could have a hidden dependence if f and g update the same variable.

Also depends on what f and g can do to x.

Page 51: ECE 1747H : Parallel Programming Lecture 1: Overview

Be careful … Example 15

for(i=0; i<100; i++)

a[i+10] = f(a[i]);

• Dependence between a[10], a[20], …

• Dependence between a[11], a[21], …

• …

• Some parallel execution is possible.

Page 52: ECE 1747H : Parallel Programming Lecture 1: Overview

Be careful … Example 16

for( i=1; i<100;i++ ) {a[i] = …;... = a[i-1];

}

• Dependence between a[i] and a[i-1]

• Complete parallel execution impossible

• Pipelined parallel execution possible

Page 53: ECE 1747H : Parallel Programming Lecture 1: Overview

Be careful … Example 14

for( i=0; i<100; i++ )

a[i] = f(a[indexa[i]]);

• Cannot tell for sure.

• Parallelization depends on user knowledge of values in indexa[].

• User can tell, compiler cannot.

Page 54: ECE 1747H : Parallel Programming Lecture 1: Overview

An aside

• Parallelizing compilers analyze program dependences to decide parallelization.

• In parallelization by hand, user does the same analysis.

• Compiler more convenient and more correct

• User more powerful, can analyze more patterns.

Page 55: ECE 1747H : Parallel Programming Lecture 1: Overview

To remember

• Statement order must not matter.

• Statements must not have dependences.

• Some dependences can be removed.

• Some dependences may not be obvious.