1. implementation: target architectures · implementation:... risc technology ... • different...

163
Implementation: . . . RISC Technology Pipelining Superscalar Processors Cache Memory Memory Hierarchy Parallel Computers – . . . Flynn’s Classification . . . Memory Access . . . Parallelization The Programming . . . MPI Messages Programming with MPI Load Distribution Designing Load . . . Classification of . . . Examples of LD- . . . Performance Evaluation Page 1 of 18 Introduction to Scientific Computing 9. Implementation Miriam Mehl 1. Implementation: Target Architectures

Upload: phungkhue

Post on 24-Jul-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

Page 2: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

Page 3: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

Page 4: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

Page 5: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

Page 6: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

Page 7: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

Page 8: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

Page 9: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

Page 10: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

Page 11: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

– important features:

Page 12: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

– important features:

* RISC (Reduced Instruction Set Computer) technology

Page 13: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

– important features:

* RISC (Reduced Instruction Set Computer) technology

* well-developed pipelining

Page 14: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

– important features:

* RISC (Reduced Instruction Set Computer) technology

* well-developed pipelining

* superscalarprocessor organization

Page 15: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

– important features:

* RISC (Reduced Instruction Set Computer) technology

* well-developed pipelining

* superscalarprocessor organization

* cachingand multi-level memory hierarchy

Page 16: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 1 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

1. Implementation: Target Architectures

• different target architectures for numerical simulations:

– monoprocessors

– supercomputers

• modern microprocessors:

– obvious trends:

* increasing clock rates (> 2GHz almost standard)

* more MIPS, more FLOPS

* very-, ultra-, and ???-large scale integration; hence, moretransistors and more functionality on the chip

* longer words: 64 Bit architectures are standard (work-stations) or coming (PCs)

– important features:

* RISC (Reduced Instruction Set Computer) technology

* well-developed pipelining

* superscalarprocessor organization

* cachingand multi-level memory hierarchy

* VLIW, Multi Thread Architecture, On-chip multiproces-sors, ...

Page 17: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 2 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

2. RISC Technology

• counter-trend to CISC: more and more complex instructions en-tailing microprogramming

Page 18: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 2 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

2. RISC Technology

• counter-trend to CISC: more and more complex instructions en-tailing microprogramming

• now instead:

– relatively small number of instructions (tens)

Page 19: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 2 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

2. RISC Technology

• counter-trend to CISC: more and more complex instructions en-tailing microprogramming

• now instead:

– relatively small number of instructions (tens)

– simple machine instructions, fixed format, few address modes

Page 20: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 2 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

2. RISC Technology

• counter-trend to CISC: more and more complex instructions en-tailing microprogramming

• now instead:

– relatively small number of instructions (tens)

– simple machine instructions, fixed format, few address modes

– load-and-storeprinciple: only explicit LOAD/WRITE instruc-tions have memory access

Page 21: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 2 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

2. RISC Technology

• counter-trend to CISC: more and more complex instructions en-tailing microprogramming

• now instead:

– relatively small number of instructions (tens)

– simple machine instructions, fixed format, few address modes

– load-and-storeprinciple: only explicit LOAD/WRITE instruc-tions have memory access

– no more need for microprogramming

Page 22: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

Page 23: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

Page 24: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

– reserve registers,

Page 25: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

– reserve registers,

– execute,

Page 26: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

– reserve registers,

– execute,

– write results

Page 27: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

– reserve registers,

– execute,

– write results

• further improvement: reorder steps of an instruction (LOAD asearly as possible, WRITE as late as possible: avoids risk of idlewaiting time)

Page 28: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

– reserve registers,

– execute,

– write results

• further improvement: reorder steps of an instruction (LOAD asearly as possible, WRITE as late as possible: avoids risk of idlewaiting time)

• best case: identical instructions to be pipelined/overlapped, asin vector processors

Page 29: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 3 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

3. Pipelining

• decompose instructions into simple steps involving different partsof the CPU:

– load,

– decode,

– reserve registers,

– execute,

– write results

• further improvement: reorder steps of an instruction (LOAD asearly as possible, WRITE as late as possible: avoids risk of idlewaiting time)

• best case: identical instructions to be pipelined/overlapped, asin vector processors

• pipelining needs different functional units in the CPU that candeal with the different steps in parallel; therefore:

Page 30: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 4 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

4. Superscalar Processors

• several parts of the CPU are available in more than 1 copy

Page 31: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 4 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

4. Superscalar Processors

• several parts of the CPU are available in more than 1 copy

• example: MIPS R10000 has 5 execution pipelines

Page 32: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 4 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

4. Superscalar Processors

• several parts of the CPU are available in more than 1 copy

• example: MIPS R10000 has 5 execution pipelines

– one for FP-multiplication, one for FP-addition

Page 33: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 4 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

4. Superscalar Processors

• several parts of the CPU are available in more than 1 copy

• example: MIPS R10000 has 5 execution pipelines

– one for FP-multiplication, one for FP-addition

– two integer ALU (arithmetic-logical units)

Page 34: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 4 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

4. Superscalar Processors

• several parts of the CPU are available in more than 1 copy

• example: MIPS R10000 has 5 execution pipelines

– one for FP-multiplication, one for FP-addition

– two integer ALU (arithmetic-logical units)

– one address pipeline

Page 35: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

Page 36: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

Page 37: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

Page 38: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

Page 39: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

• look for strategies to ensure hit-probability p close to 1:

Page 40: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

• look for strategies to ensure hit-probability p close to 1:

– choice of section: what to be kept in cache?

Page 41: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

• look for strategies to ensure hit-probability p close to 1:

– choice of section: what to be kept in cache?

– ensure locality of data (instructions in cache need data incache)

Page 42: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

• look for strategies to ensure hit-probability p close to 1:

– choice of section: what to be kept in cache?

– ensure locality of data (instructions in cache need data incache)

– strategies for fetching, replacement, and updating

Page 43: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

• look for strategies to ensure hit-probability p close to 1:

– choice of section: what to be kept in cache?

– ensure locality of data (instructions in cache need data incache)

– strategies for fetching, replacement, and updating

– association: how to check whether data are available incache?

Page 44: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 5 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

5. Cache Memory

• CPU performance increased faster than memory access speed

• thus: reduce memory access time / latency

• cache memory: small and fast on-chip memory, keeps part ofthe main memory

• optimum: needed data is always available in cache memory

• look for strategies to ensure hit-probability p close to 1:

– choice of section: what to be kept in cache?

– ensure locality of data (instructions in cache need data incache)

– strategies for fetching, replacement, and updating

– association: how to check whether data are available incache?

– consistency: no different versions in cache and main mem-ory

Page 45: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

Page 46: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

Page 47: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

Page 48: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

Page 49: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

– remote memory

the faster, the smaller

Page 50: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

– remote memory

the faster, the smaller

• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:

Page 51: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

– remote memory

the faster, the smaller

• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:

– example: matrix-vector product Ax with A too large for cache

Page 52: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

– remote memory

the faster, the smaller

• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:

– example: matrix-vector product Ax with A too large for cache

– standard algorithm:

* outer loop over rows of A,

* inner loop for scalar product of one row of A with x

Page 53: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

– remote memory

the faster, the smaller

• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:

– example: matrix-vector product Ax with A too large for cache

– standard algorithm:

* outer loop over rows of A,

* inner loop for scalar product of one row of A with x

– if current contents of cache are some rows of A, it’s OK

Page 54: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,

– (level-1/2/3) cache,

– main memory,

– hard disk,

– remote memory

the faster, the smaller

• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:

– example: matrix-vector product Ax with A too large for cache

– standard algorithm:

* outer loop over rows of A,

* inner loop for scalar product of one row of A with x

– if current contents of cache are some rows of A, it’s OK

– if current contents of cache are some columns of A: slow!

Page 55: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 6 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

6. Memory Hierarchy

• today: several cache levels → memory hierarchy:

– register,– (level-1/2/3) cache,– main memory,– hard disk,– remote memory

the faster, the smaller

• notion of the target computer’s memory hierarchy is importantfor numerical algorithms’ efficiency:

– example: matrix-vector product Ax with A too large for cache– standard algorithm:

* outer loop over rows of A,

* inner loop for scalar product of one row of A with x

– if current contents of cache are some rows of A, it’s OK– if current contents of cache are some columns of A: slow!– tuning crucial: peak performance up to 4 orders of magni-

tude higher than performance observed in practice (withouttuning)

Page 56: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

Page 57: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

Page 58: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

Page 59: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

Page 60: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

Page 61: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

Page 62: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

* shuffle exchange network

Page 63: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

* shuffle exchange network

• crucial quantities:

Page 64: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

* shuffle exchange network

• crucial quantities:

– diameter (longest path between two processors)

Page 65: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

* shuffle exchange network

• crucial quantities:

– diameter (longest path between two processors)

– number of network connections (ports) per processor

Page 66: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

* shuffle exchange network

• crucial quantities:

– diameter (longest path between two processors)

– number of network connections (ports) per processor

– parallel communications possible?

Page 67: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 7 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

7. Parallel Computers – Topologies

• parallel computers – distributed systems: frontier?

• different possibilities of arrangement:

– static network topologies:

* bus, ring, grid, or torus

* binary tree or fat tree

* hypercube

– dynamic network topologies:

* crossbar switch

* shuffle exchange network

• crucial quantities:

– diameter (longest path between two processors)

– number of network connections (ports) per processor

– parallel communications possible?

– existence of bottlenecks?

Page 68: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

Page 69: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

Page 70: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

• SIMD : Single Instruction Multiple Data

– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )

Page 71: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

• SIMD : Single Instruction Multiple Data

– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )

– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)

Page 72: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

• SIMD : Single Instruction Multiple Data

– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )

– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)

• MIMD : Multiple Instruction Multiple Data

– multiprocessors:

* distributed memory(loose coupling, explicit communica-tion; Intel Paragon, IBM SP-2) or

Page 73: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

• SIMD : Single Instruction Multiple Data

– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )

– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)

• MIMD : Multiple Instruction Multiple Data

– multiprocessors:

* distributed memory(loose coupling, explicit communica-tion; Intel Paragon, IBM SP-2) or

* shared memory(tight coupling, global address space, im-plicit communication; most workstation servers) or

Page 74: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

• SIMD : Single Instruction Multiple Data

– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )

– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)

• MIMD : Multiple Instruction Multiple Data

– multiprocessors:

* distributed memory(loose coupling, explicit communica-tion; Intel Paragon, IBM SP-2) or

* shared memory(tight coupling, global address space, im-plicit communication; most workstation servers) or

* nets/clusters

Page 75: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 8 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

8. Flynn’s Classification (1972)

• SISD: Single Instruction Single Data

– classical von-Neumann monoprocessor

• SIMD : Single Instruction Multiple Data

– vector computers: extreme pipeling, one instruction appliedto a sequence (vector) of data (CRAY 1,2,X,Y,J/C/T90,. . . )

– array computers: array of processors, concurrency (Think-ing Machines CM-2, MasPar MP-1, MP-2)

• MIMD : Multiple Instruction Multiple Data

– multiprocessors:

* distributed memory(loose coupling, explicit communica-tion; Intel Paragon, IBM SP-2) or

* shared memory(tight coupling, global address space, im-plicit communication; most workstation servers) or

* nets/clusters

• MISD : Multiple Instruction Single Data: rare

Page 76: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 9 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

9. Memory Access Classification

Page 77: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 9 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

9. Memory Access Classification

• other criteria for classification:

scalability (S), programming model (PM), portability (P), and loaddistribution (L)

Page 78: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 9 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

9. Memory Access Classification

• other criteria for classification:

scalability (S), programming model (PM), portability (P), and loaddistribution (L)

• UMA : Uniform Memory Access

– shared memory systems: SMP (symmetric multiprocessors,parallel vector processors); PC- and WS-servers, CRAYYMP

– advantage: P, PM, L; drawback: S

Page 79: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 9 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

9. Memory Access Classification

• other criteria for classification:

scalability (S), programming model (PM), portability (P), and loaddistribution (L)

• UMA : Uniform Memory Access

– shared memory systems: SMP (symmetric multiprocessors,parallel vector processors); PC- and WS-servers, CRAYYMP

– advantage: P, PM, L; drawback: S

• NORMA : No Remote Memory Access

– distributed memory systems; clusters, IBM SP-2, iPSC/860

– advantage: S; drawback: P, PM, L

Page 80: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 9 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

9. Memory Access Classification

• other criteria for classification:

scalability (S), programming model (PM), portability (P), and loaddistribution (L)

• UMA : Uniform Memory Access

– shared memory systems: SMP (symmetric multiprocessors,parallel vector processors); PC- and WS-servers, CRAYYMP

– advantage: P, PM, L; drawback: S

• NORMA : No Remote Memory Access

– distributed memory systems; clusters, IBM SP-2, iPSC/860

– advantage: S; drawback: P, PM, L

• NUMA : Non-Uniform Memory Access

– systems with virtually shared memory; KSR-1, CRAY T3D/T3E,CONVEX SPP

– Advantage: PM, S, P; drawback: cache-coherence, com-mun.

Page 81: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

Page 82: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

Page 83: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

Page 84: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

Page 85: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

– object-oriented: SMALLTALK

Page 86: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

– object-oriented: SMALLTALK

– functional/applicative: LISP

Page 87: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

– object-oriented: SMALLTALK

– functional/applicative: LISP

• implicit parallelization typically via special compilers

Page 88: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

– object-oriented: SMALLTALK

– functional/applicative: LISP

• implicit parallelization typically via special compilers

• explicit parallelization typically via linked communication libraries

Page 89: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

– object-oriented: SMALLTALK

– functional/applicative: LISP

• implicit parallelization typically via special compilers

• explicit parallelization typically via linked communication libraries

• traditional way in Scientific Computing: FORTRAN code,vectorizing compiler, CRAY, wait for results

Page 90: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 10 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

10. Parallelization

• classical programming paradigms are, in principle, all well-suitedfor explicit or implicit parallelization:

– imperative: FORTRAN, C (dominant male, recently withsome OO-touch like in C++)

– logical/relational: PROLOG

– object-oriented: SMALLTALK

– functional/applicative: LISP

• implicit parallelization typically via special compilers

• explicit parallelization typically via linked communication libraries

• traditional way in Scientific Computing: FORTRAN code,vectorizing compiler, CRAY, wait for results

• explicit parallelization often difficult (cf. Gauß-Seidel), this makesnon-conventional approaches attractive

Page 91: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

Page 92: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

Page 93: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

Page 94: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

Page 95: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

* originally for clusters, today used even on massivelyparallel computers, too

Page 96: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

* originally for clusters, today used even on massivelyparallel computers, too

* MPI-1 developed 1992-1994

Page 97: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

* originally for clusters, today used even on massivelyparallel computers, too

* MPI-1 developed 1992-1994

* explicit exchange of messages: higher amount of pro-gramming work, but increasing possibilities of tuning andoptimizing

Page 98: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

* originally for clusters, today used even on massivelyparallel computers, too

* MPI-1 developed 1992-1994

* explicit exchange of messages: higher amount of pro-gramming work, but increasing possibilities of tuning andoptimizing

• MPI Features:

– parallel program: n processes, separate address spaces,no remote access

Page 99: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

* originally for clusters, today used even on massivelyparallel computers, too

* MPI-1 developed 1992-1994

* explicit exchange of messages: higher amount of pro-gramming work, but increasing possibilities of tuning andoptimizing

• MPI Features:

– parallel program: n processes, separate address spaces,no remote access

– message exchange via system calls sendand receive

Page 100: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 11 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

11. The Programming Model MPI

• How to write parallel programs?

– UMA systems: simple answer – just as sequential ones

– distributed memory systems: MPI model or standard

* Message Passing Interface

* originally for clusters, today used even on massivelyparallel computers, too

* MPI-1 developed 1992-1994

* explicit exchange of messages: higher amount of pro-gramming work, but increasing possibilities of tuning andoptimizing

• MPI Features:

– parallel program: n processes, separate address spaces,no remote access

– message exchange via system calls sendand receive

– MPI-kernel: library of communication routines, allowing tointegrate MPI commands into standard languages

Page 101: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

Page 102: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

• messages are buffered (send buffer, receive buffer)

Page 103: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

• messages are buffered (send buffer, receive buffer)

• sending a message can be

– blocking(finished only after message has left node) or

Page 104: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

• messages are buffered (send buffer, receive buffer)

• sending a message can be

– blocking(finished only after message has left node) or

– non-blocking(finished immediately, message may be sentlater)

Page 105: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

• messages are buffered (send buffer, receive buffer)

• sending a message can be

– blocking(finished only after message has left node) or

– non-blocking(finished immediately, message may be sentlater)

• the same holds for receiving a message:

– blocking: waiting;

Page 106: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

• messages are buffered (send buffer, receive buffer)

• sending a message can be

– blocking(finished only after message has left node) or

– non-blocking(finished immediately, message may be sentlater)

• the same holds for receiving a message:

– blocking: waiting;

– non-blocking: looking for it from time to time

Page 107: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 12 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

12. MPI Messages

• messages consist of a

– header (recipient, buffer, type, context of communication)and of their

– body(contents)

• messages are buffered (send buffer, receive buffer)

• sending a message can be

– blocking(finished only after message has left node) or

– non-blocking(finished immediately, message may be sentlater)

• the same holds for receiving a message:

– blocking: waiting;

– non-blocking: looking for it from time to time

cost of passing a message (length N, buffer cap. K):

t(N) = α · NK

+ β ·Ninitializing cost/time α, transportation cost β

Page 108: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

Page 109: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

• a simple example:

P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again

Page 110: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

• a simple example:

P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again

• without buffering: deadlocks possible

Page 111: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

• a simple example:

P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again

• without buffering: deadlocks possible

– nothing specified: buffering possible, but not imperative

Page 112: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

• a simple example:

P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again

• without buffering: deadlocks possible

– nothing specified: buffering possible, but not imperative

– never: no buffering (efficient, but risky)

Page 113: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

• a simple example:

P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again

• without buffering: deadlocks possible

– nothing specified: buffering possible, but not imperative

– never: no buffering (efficient, but risky)

– always: secure, but sometimes costly

Page 114: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 13 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

13. Programming with MPI

• a simple example:

P1: compute something P2: compute somethingstore result in SBUF store result in SBUFSendBlocking(P2,SBUF) SendBlocking(P1,SBUF)RecBlocking(P2,RBUF) RecBlocking(P1,RBUF)read data in RBUF read data in RBUFcompute again compute again

• without buffering: deadlocks possible

– nothing specified: buffering possible, but not imperative

– never: no buffering (efficient, but risky)

– always: secure, but sometimes costly

• collective communication features available:

– broadcast, gather, gather-to-all, scatter, all-to-all,. . .

Page 115: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

Page 116: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

Page 117: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

Page 118: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

Page 119: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

Page 120: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

Page 121: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

Page 122: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

* local: when does which processor which process

Page 123: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

* local: when does which processor which process

– load balancing:

* static: a priori

Page 124: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

* local: when does which processor which process

– load balancing:

* static: a priori

* dynamic: during runtime

Page 125: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

* local: when does which processor which process

– load balancing:

* static: a priori

* dynamic: during runtime

• in Scientific Computing applications load is often not predictable:

– adaptive refinement of a finite element mesh,

Page 126: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

* local: when does which processor which process

– load balancing:

* static: a priori

* dynamic: during runtime

• in Scientific Computing applications load is often not predictable:

– adaptive refinement of a finite element mesh,

– convergence behaviour of iterations may differ

Page 127: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 14 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

14. Load Distribution

• load: amount of work on processors

– optimum: minimize idle times; needs estimates and moni-toring

– strategy: load balancingor load distribution or scheduling

– important: avoid overhead

• one distinguishes

– scheduling:

* global: where do which processes run?

* local: when does which processor which process

– load balancing:

* static: a priori

* dynamic: during runtime

• in Scientific Computing applications load is often not predictable:

– adaptive refinement of a finite element mesh,

– convergence behaviour of iterations may differ

– thus: static load balancing not sufficient

Page 128: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

Page 129: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

Page 130: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

Page 131: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

Page 132: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

Page 133: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

* OS?

Page 134: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

* OS?

• Any special features of the application to be considered?

– restrictions in allocation process-to-processor frequent inS.C.

Page 135: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

* OS?

• Any special features of the application to be considered?

– restrictions in allocation process-to-processor frequent inS.C.

• Which units shall be distributed or displaced?

Page 136: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

* OS?

• Any special features of the application to be considered?

– restrictions in allocation process-to-processor frequent inS.C.

• Which units shall be distributed or displaced?

– whole processes (coarse grain)

Page 137: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

* OS?

• Any special features of the application to be considered?

– restrictions in allocation process-to-processor frequent inS.C.

• Which units shall be distributed or displaced?

– whole processes (coarse grain)

– threads (fine grain)

Page 138: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 15 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

15. Designing Load Distribution

• Which are the primary objectives?

– optimization of system loador application runtime?

– placementof new processes or migration of running pro-cesses?

• Which is the level of integration?

– Who initiates actions (measure load, chose strategy)?

* application program

* runtime system

* OS?

• Any special features of the application to be considered?

– restrictions in allocation process-to-processor frequent inS.C.

• Which units shall be distributed or displaced?

– whole processes (coarse grain)

– threads (fine grain)

– objects or data (typical for simulation applications)

Page 139: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

Page 140: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

Page 141: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

Page 142: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

Page 143: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

• distribution mechanisms:

Page 144: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

• distribution mechanisms:

– load handed over to neighbouring nodes only?

Page 145: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

• distribution mechanisms:

– load handed over to neighbouring nodes only?

– just distribution of new units or migration of running ones(how?)?

Page 146: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

• distribution mechanisms:

– load handed over to neighbouring nodes only?

– just distribution of new units or migration of running ones(how?)?

• flow of information:

to whom is load communicated, from where comes information?

Page 147: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

• distribution mechanisms:

– load handed over to neighbouring nodes only?

– just distribution of new units or migration of running ones(how?)?

• flow of information:

to whom is load communicated, from where comes information?

• coordination:

who makes decisions? autonomous/cooperative/competitive?

Page 148: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 16 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

16. Classification of Strategies

• origin of the idea:

from physics (diffusion model), from combinatorics (graph the-ory), economics (bidding, brokerage)

• for networks, for bus topologies

• data represented as grids, trees, sets, or . . .

• distribution mechanisms:

– load handed over to neighbouring nodes only?

– just distribution of new units or migration of running ones(how?)?

• flow of information:

to whom is load communicated, from where comes information?

• coordination:

who makes decisions? autonomous/cooperative/competitive?

• algorithms:

who initiates measures? adaptivity? costs relevant? evalua-tion?

Page 149: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

Page 150: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

Page 151: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

• bidding model:

supply and demand, establishment of some market

Page 152: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

• bidding model:

supply and demand, establishment of some market

• broker model:

– esp. for heterogeneous hierarchical topologies, scalable

Page 153: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

• bidding model:

supply and demand, establishment of some market

• broker model:

– esp. for heterogeneous hierarchical topologies, scalable

– broker with partial knowledge, budget-based decision whetherlocal processing or looking for better offers

Page 154: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

• bidding model:

supply and demand, establishment of some market

• broker model:

– esp. for heterogeneous hierarchical topologies, scalable

– broker with partial knowledge, budget-based decision whetherlocal processing or looking for better offers

– prices for use of resources and brokerage

Page 155: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

• bidding model:

supply and demand, establishment of some market

• broker model:

– esp. for heterogeneous hierarchical topologies, scalable

– broker with partial knowledge, budget-based decision whetherlocal processing or looking for better offers

– prices for use of resources and brokerage

• matching model:

construct matching in topology graph, balance along edges

Page 156: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 17 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

17. Examples of LD-Strategies

• diffusion model:

permanent balancing process between neighbours

• bidding model:

supply and demand, establishment of some market

• broker model:

– esp. for heterogeneous hierarchical topologies, scalable

– broker with partial knowledge, budget-based decision whetherlocal processing or looking for better offers

– prices for use of resources and brokerage

• matching model:

construct matching in topology graph, balance along edges

• balanced allocation, space-filling curves, . . .

Page 157: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

Page 158: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

• speedup S: S =sequential runtime

parallel runtime

Page 159: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

• speedup S: S =sequential runtime

parallel runtime

• efficiency E: E = Sp

Page 160: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

• speedup S: S =sequential runtime

parallel runtime

• efficiency E: E = Sp

• Amdahl’s Law :

assumption: each program has some part 0 < seq < 1 that canonly be treated in a sequential way

S ≤ 1

seq+ 1−seqp

≤ 1seq

Page 161: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

• speedup S: S =sequential runtime

parallel runtime

• efficiency E: E = Sp

• Amdahl’s Law :

assumption: each program has some part 0 < seq < 1 that canonly be treated in a sequential way

S ≤ 1

seq+ 1−seqp

≤ 1seq

• another important quantity: CCR (Communication-to-ComputationRatio)

Page 162: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

• speedup S: S =sequential runtime

parallel runtime

• efficiency E: E = Sp

• Amdahl’s Law :

assumption: each program has some part 0 < seq < 1 that canonly be treated in a sequential way

S ≤ 1

seq+ 1−seqp

≤ 1seq

• another important quantity: CCR (Communication-to-ComputationRatio)

– CCR often increases with increasing p and constant prob-lem size (example: iterative methods for Ax = b)

Page 163: 1. Implementation: Target Architectures · Implementation:... RISC Technology ... • different target architectures for numerical simulations: ... * more MIPS, more FLOPS. Implementation:

Implementation: . . .

RISC Technology

Pipelining

Superscalar Processors

Cache Memory

Memory Hierarchy

Parallel Computers – . . .

Flynn’s Classification . . .

Memory Access . . .

Parallelization

The Programming . . .

MPI Messages

Programming with MPI

Load Distribution

Designing Load . . .

Classification of . . .

Examples of LD- . . .

Performance Evaluation

Page 18 of 18

Introduction to Scientific Computing

9. ImplementationMiriam Mehl

18. Performance Evaluation

• performance evaluation of algortihms and computers

• average parallelism(for p processors):

A(p) =sum of processor runtimes

parallel runtime

• speedup S: S =sequential runtime

parallel runtime

• efficiency E: E = Sp

• Amdahl’s Law :assumption: each program has some part 0 < seq < 1 that canonly be treated in a sequential way

S ≤ 1

seq+ 1−seqp

≤ 1seq

• another important quantity: CCR (Communication-to-ComputationRatio)

– CCR often increases with increasing p and constant prob-lem size (example: iterative methods for Ax = b)

– therefore: do not compare speedups for different p, butsame problem size