high performance computing - moreno.marzolla.name€¦ · high performance computing 12 exam...

56
High Performance High Performance Computing Computing Moreno Marzolla Dip. di Informatica—Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ Pacheco, Chapter 1

Upload: others

Post on 24-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance High Performance ComputingComputing

Moreno MarzollaDip. di Informatica—Scienza e Ingegneria (DISI)Università di Bologna

http://www.moreno.marzolla.name/

Pacheco, Chapter 1

Page 2: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 2

Page 3: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 3

Credits

● prof. Salvatore Orlando, Univ. Ca' Foscari di Veneziahttp://www.dsi.unive.it/~orlando/

● prof. Mary Hall, University of Utahhttps://www.cs.utah.edu/~mhall/

● Tim Mattson, Intel

Page 4: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 4

Who I am

● Moreno Marzolla– Associate professor @ DISI– http://www.moreno.marzolla.name/

● Current and past teaching activity– High Performance Computing @ ISI– Fondamenti di Informatica A @ Ing. Biomedica/Elettronica– Past: Algoritmi e Strutture Dati; Sistemi Complessi;

Ingegneria del Software● Research activity

– Parallel programming– Modeling and simulation

Page 5: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 5

High Performance Computing

● Web page– http://www.moreno.marzolla.name/teaching/HPC/

● Schedule– Monday 9:00—12:00 Room 3.7– Wednesday 14:00—17:00 Lab 2.2– Please check the course Web page and the official course

timetable for variations● Office hours

– At any time; please send e-mail

Page 6: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 6

References● Peter S. Pacheco, An Introduction to

Parallel Programming, Morgan Kaufmann 2011, ISBN 9780123742605– Theory + OpenMP/MPI programming

● CUDA C programming guide http://docs.nvidia.com/cuda/cuda-c-programming-guide/

– CUDA/C programming● See the Web page for slides and

links to online materialhttps://www.moreno.marzolla.name/

Page 7: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

On lecture slides

https://biblioklept.org/2010/06/11/the-british-library-acquires-j-g-ballard-archive/

How you see lecture slides How I see lecture slides

..

Page 8: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 8

Prerequisites

High PerformanceComputing

ProgrammingAlgorithms andData Structures

ComputerArchitectures

Operating Systems

….

Page 9: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 9

Syllabus

● 6 CFU (~ 60 hours of lectures/lab)– Lectures ~40 hours– Lab sessions ~20 hours

● Theory (first ~3 weeks)– Parallel architectures– Parallel programming patterns– Performance evaluation of parallel programs

● Parallel programming (rest of the course)– Shared-memory programming with C/OpenMP– Distributed-memory programming with C/MPI– GPU programming with CUDA/C– SIMD programming (if there is enough time)

Page 10: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 10

Lab sessions

● Hands-on programming exercises● We will work under Linux only● Why?

Updated on sep 2019;Source: http://www.top500.org

Page 11: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 11

Hardware resources

● isi-raptor03.csr.unibo.it– Dual socket Xeon, 12 cores, 64 GB RAM, Ubuntu 16.04– 3x NVidia GeForce GTX 1070

● A 16 cores VM w/16 GB RAM, Debian/Jessie is available as a backup solution for OpenMP and MPI programming

Page 12: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 12

Exam

● Written exam (weight: 40%)– Questions/simple exercises on all topics addressed during the course (sample exams

are available on the Web page)– 6 dates: 2 in the winter term (jan/feb 2020); 3 in the summer term (jun/jul 2020); 1 in

the fall term (sep 2020)– You can refuse the grade and redo the written exam

● Individual programming project + written report (weight: 60%)– Project specification defined by the instructor– There is no discussion, unless I need explanations– If you refuse the grade, you must hand-in a NEW project on NEW specifications

● Final grade rounded to the nearest integer● The written exam and programming project are independent, and can be

completed in any order● Grades remain valid until sep 30, 2020

– After that, a new academic year starts– There is no guarantee that the instructor and/or type of exam remain the same...

Page 13: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 13

Grading the programming project

● Correctness● Clarity● Efficiency● Quality of the written report

– Proper grammar, syntax, ...– Technical correctness– Performance evaluation

Page 14: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during
Page 15: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 15

Questions?

Page 16: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 16

Intro to parallel programming

Page 17: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 17

High Performance Computing

● Many applications need considerable computing power– Weather forecast, climate modeling, physics simulation,

product engineering, 3D animation, finance, ...● Why?

– To solve more complex problems– To solve the same problem in less time– To solve the same problem more accurately– To make better use of available computing resources

Page 18: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 18

Parallel programming

● “Traditional” scientific paradigm– Make a theory, then experiment

● “Traditional” engineering paradigm– Design, then build

● Enter numeric experimentation and prototyping– Some phenomena are too complex to be modeled accurately

(e.g., weather forecast)– Soime experiments are too complex, or costly, or dangerous, or

impossible to do in the lab (e.g., wind tunnels, seismic simulations, stellar dynamics...)

● Computational science– Numerical simulations are becoming a new way to “do science”

Slide credits: S. Orlando

Page 19: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 19

Applications: Numerical Wind Tunnel

Source: http://ecomodder.com/forum/showthread.php/random-wind-tunnel-smoke-pictures-thread-26678-12.html

Page 20: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Applications:Molecular dynamics

Page 22: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 22

Applications:Cosmological Simulation

Bolshoi simulation https://vimeo.com/29769051

The Bolshoi Simulation recreates the large-scale structure of the universe; it required 6 million CPU hours on NASA's Pleiades Supercomputer

Source : https://www.nas.nasa.gov/hecc/resources/pleiades.html

Page 23: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 23

Moore's Law

"The number of transistors on an IC doubles every 24 months"

● That used to mean that every new generation of processors was based on smaller transistors

Moore, G.E., Cramming more components onto integrated circuits. Electronics, 38(8), April 1965 Gordon E.

Moore(1929– )

Page 24: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 24By Wgsimon - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=15193542

Log scale

Page 25: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 25

Physics lesson

● Smaller transistors → Faster processor● Faster processor → Higher power consumption● Higher power consumption → More heat produced● More heat produced → Unreliable processor

Page 26: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 26

Power

● The power required by an IC (e.g., a processor) can be expressed as

Power = C ´ V 2 ´ f

where:– C is the capacitance (ability of a circuit to store energy)– V is the voltage– f is the frequency at which the processor operates

Page 27: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 27

Power

ProcessorInput Output

f

Processor

Processor

Input Output

f / 2

f / 2

f

Credits: Tim Mattson

Capacitance CVoltage VFrequency fPower = C V 2 f

Capacitance 2.2 CVoltage 0.6 VFrequency 0.5 f Power = 0.396 C V 2 f

Page 28: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 28Source: https://www.karlrupp.net/2018/02/42-years-of-microprocessor-trend-data/

Page 29: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 29

Processor/Memory wall

Source: John L. Hennessy, David A. Patterson, Computer Architecture: a Quantitative Approach, Fifth Ed., Morgan Kaufman 2012, ISBN: 978-0-12-383872-8

Page 30: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 30

Limits

● There are limits to “automatic” improvement of scalar performance:– The Power Wall: Clock frequency cannot be increased

without exceeding air cooling– The Memory Wall: Access to data is a limiting factor– The ILP Wall: All the existing instruction-level parallelism

(ILP) is already being used● Conclusion:

– Explicit parallel mechanisms and explicit parallel programming are required for performance scaling

Slide credits: Hebenstreit, Reinders, Robison, McCool, SC13 tutorial

Page 31: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 31

What happens today?

● HW designers create processors with more cores

● Result:– parallel hardware is

ubiquitous– parallel software is

rare● The challenge

– Make parallel software as common as parallel hardware

NVidia Tegra 4 SoC

Page 32: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 32

Parallel programming in brief

● Decompose the problem in sub-problems● Distribute sub-problems to the available execution

units● Solve sub-problems independently

– Cooperate to solve sub-problems● Goals

– Reduce the wall-clock time– Balance the workload across execution units– Reduce communication and synchronization overhead

Slide credits: S. Orlando

Page 33: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 33

Concurrency vs Parallelism

Slide credits: Tim Mattson, Intel

Task 1

Task 2

Task 3

Concurrency without parallelism Parallelism

Page 34: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 34

The “Holy Grail”

● Write serial code and have a “smart” compiler capable of parallelizing programs automatically

● It has been done in some very specific cases– In practice, no compiler proved to be “smart” enough

● Writing efficient parallel code requires that the programmer understands, and makes explicit use of, the underlying hardware Here we come

Page 35: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 35

Parallel programming is difficult

Serial version(~49 lines of C++ code)

Parallel version(~1000 lines of C/C++ code)

http://www.moreno.marzolla.name/software/svmcell/

.

Page 36: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 36

Issues of parallel programming

● Writing parallel programs is in general much harder than writing sequential code

● There is limited portability across different types of architectures– E.g., a distributed-memory parallel program must be

rewritten from scratch to run on a GPU– However, there are standards (OpenMP, MPI, OpenCL) that

allow portability across the same type of parallel architecture● Tuning for best performance is time-consuming

Page 37: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 37

ExampleSum-reduction of an array

("Hello, world!" of parallel programming)

Page 38: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 38

Sum-Reduction

● We start assuming a shared-memory architecture– All execution units share a common memory space

● We begin with a sequential solution and parallelize it– This is not always a good idea; some parallel algorithms

have nothing in common with their sequential counterparts!– However, it is sometimes a reasonable starting point

Credits: Mary Hall

Page 39: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 39

Sequential algorithm

● Compute the sum of the content of an array A of length n

float seq_sum(float* A, int n){

int i;float sum = 0.0;for (i=0; i<n; i++) {

sum += A[i];}return sum;

}

Credits: Mary Hall

Page 40: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 40

Version 1 (Wrong!)

● Assuming P execution units (e.g., processors), each one computes a partial sum of n / P adjacent elements

● Example: n = 15, P = 3

my_block_len = n/P; my_start = my_id * my_block_len; my_end = my_start + my_block_len; sum = 0.0;for (my_i=my_start; my_i<my_end; my_i++) { my_x = get_value(my_i); sum += my_x;}

Proc 0 Proc 1 Proc 2

WRONG

Variables whose names start with my_ are assumed to be local (private) to each processor; all other

variables are assumed to be global (shared)

Race condition!

.

Page 41: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 41

Version 1 (better, but still wrong)

● Assuming P processors, each one computes a partial sum of n / P adjacent elements

● Example: n = 15, P = 3

my_block_len = n/P; my_start = my_id * my_block_len; my_end = my_start + my_block_len; sum = 0.0; mutex m;for (my_i=my_start; my_i<my_end; my_i++) { my_x = get_value(my_i); mutex_lock(&m); sum += my_x; mutex_unlock(&m);}

WRONG

Proc 0 Proc 1 Proc 2

Page 42: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 42

Version 1 (better, but still wrong)

● Assuming P processors, each one computes a partial sum of n / P adjacent elements

● Esempio: n = 17, P = 3

my_block_len = n/P; my_start = my_id * my_block_len; my_end = my_start + my_block_len; sum = 0.0; mutex m;for (my_i=my_start; my_i<my_end; my_i++) { my_x = get_value(my_i); mutex_lock(&m); sum += my_x; mutex_unlock(&m);}

?? ??

Proc 0 Proc 1 Proc 2

WRONG

Page 43: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 43

Version 1(correct, but not efficient)

● Assuming P processors, each one computes a partial sum of n / P adjacent elements

● Example: n = 17, P = 3

my_start = n * my_id / P; my_end = n * (my_id + 1) / P;sum = 0.0; mutex m;for (my_i=my_start; my_i<my_end; my_i++) { my_x = get_value(my_i); mutex_lock(&m); sum += my_x; mutex_unlock(&m);}

Proc 0 Proc 1 Proc 2

Page 44: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 44

Version 2

● Too much contention on mutex m– Each processor acquires and releases the mutex for each element of the

array!● Solution: increase the mutex granularity

– Each processor accumulates the partial sum on a local (private) variable– The mutex is used at the end to update the global sum

my_start = n * my_id / P; my_end = n * (my_id + 1) / P;sum = 0.0; my_sum = 0.0; mutex m;for (my_i=my_start; my_i<my_end; my_i++) {

my_x = get_value(my_i);my_sum += my_x;

}mutex_lock(&m);sum += my_sum;mutex_unlock(&m);

Page 45: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 45

Version 3: Remove the mutex(wrong, in a subtle way)

● We use a shared array psum[] where each processor can store its local sum

● At the end, one processor computes the global sum

my_start = n * my_id / P; my_end = n * (my_id + 1) / P;psum[0..P-1] = 0.0; /* all elements set to 0.0 */for (my_i=my_start; my_i<my_end; my_i++) {

my_x = get_value(my_i);psum[my_id] += my_x;

}if ( 0 == my_id ) { /* only the master executes this */

sum = 0.0;for (my_i=0; my_i<P; my_i++)

sum += psum[my_i];}

WRONG

Page 46: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 46

The problem with version 3

● Processor 0 could start the computation of the global sum before all other processors have computed the local sums!

Compute local sums

Compute global sum

P0 P1 P2 P3

Page 47: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 47

Version 4(correct)

● Use a barrier synchronization

my_start = n * my_id / P; my_end = n * (my_id + 1) / P;psum[0..P-1] = 0.0; for (my_i=my_start; my_i<my_end; my_i++) {

my_x = get_value(my_i);psum[my_id] += my_x;

}barrier();if ( 0 == my_id ) {

sum = 0.0;for (my_i=0; my_i<P; my_i++)

sum += psum[my_i];}

Compute local sums

Compute global sum

P0 P1 P2 P3

barrier()

Page 48: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 48

Version 5Distributed-memory version

● P << n processors● Each processor

computes a local sum● Each processor

sends the local sum to processor 0 (the master)

...my_sum = 0.0;my_start = …, my_end = …;

for ( i = my_start; i < my_end; i++ ) {my_sum += get_value(i);

}if ( 0 == my_id ) {

for ( i=1; i<P; i++ ) {tmp = receive from proc i;my_sum += tmp;

}printf(“The sum is %f\n”, my_sum);

} else {send my_sum to thread 0;

}

Page 49: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 49

Version 5Proc 0

A[ ]

my_sum

Proc 1 Proc 2 Proc 3 Proc 4 Proc 5 Proc 6 Proc 7

1 3 -2 7 -6 5 3 4

15

4

2

9

3

8

11

Bottleneck

.

Page 50: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

Intro to Parallel Programming 50

Parallel reductionProc 0

A[ ]

my_sum

Proc 1 Proc 2 Proc 3 Proc 4 Proc 5 Proc 6 Proc 7

1 3 -2 7 -6 5 3 4

5 -1 7

6

15

4

9

● (P – 1) sums are still performed; however, processor 0 receives ~ log

2 P messages and performs ~ log

2 P sums

Page 51: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 51

Task parallelism vs Data parallelism

● Task Parallelism– Distribute (possibly different) tasks to processors

● Data Parallelism– Distribute data to processors– Each processor executes the same task on different data

Page 52: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 52

Example

● We have a table containing hourly temperatures on some location– 24 columns, 365 rows

● Compute the minimum, maximum and average temperatore for each day

● Assume we have 3 independent processors

Page 53: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 53

Example

max min ave0 1 2 3 22 23

Hour (0—23)

0

1

2

364

Day

s (0

—36

4)

Page 54: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 54

Data parallel approach

max min ave0 1 2 3 22 23

Hour (0—23)

0

1

2

364

Day

s (0

—36

4)

Proc 0

Proc 1

Proc 2

Page 55: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 55

Task parallel approach

max min ave0 1 2 3 22 23

Hour (0—23)

0

1

2

364

Day

s (0

—36

4)

Pro

c 0

Pro

c 1

Pro

c 2

Page 56: High Performance Computing - moreno.marzolla.name€¦ · High Performance Computing 12 Exam Written exam (weight: 40%) – Questions/simple exercises on all topics addressed during

High Performance Computing 56

Key concepts

● Parallel architectures “naturally” derive from physics laws

● Parallel architectures require parallel programming paradigms

● Writing parallel programs is much harder than writing sequential programs