processing elements architecturewankarcs/index_files/pdf/pc-02-02.pdf · hpvm pcs myrinet with fast...

21
1 1 Parallel Computing (Intro-04): Rajeev Wankar Processing Elements Architecture 2 Parallel Computing (Intro-04): Rajeev Wankar Simple classification by Flynn (FLYNN’S TAXONOMY): (dual concept of Instruction stream and Data stream.) SISD - conventional SIMD - data parallel, vector computing MISD - systolic arrays MIMD - very general, multiple approaches. Current development is on MIMD model, using general purpose processors. Processing Elements

Upload: others

Post on 20-Mar-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

1

1 Parallel Computing (Intro-04): Rajeev Wankar

Processing Elements Architecture

2 Parallel Computing (Intro-04): Rajeev Wankar

Simple classification by Flynn (FLYNN’S TAXONOMY):

(dual concept of Instruction stream and Data stream.)

SISD - conventional

SIMD - data parallel, vector computing

MISD - systolic arrays

MIMD - very general, multiple approaches.

Current development is on MIMD model, using general

purpose processors.

Processing Elements

Page 2: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

2

3 Parallel Computing (Intro-04): Rajeev Wankar

SISD : A Conventional Computer

Speed is limited by the rate at which computer can transfer information internally.

Processor Data Input Data Output

Instru

ction

s

Ex:PC, Macintosh, Workstations

Source: Raj Kumar Buyya ppts.

4 Parallel Computing (Intro-04): Rajeev Wankar

The MISD Architecture

Only an intellectual exercise. Few built, but commercially not available

Data

Input

Stream

Data

Output

Stream

Processor

A

Processor

B

Processor

C

Instruction

Stream A

Instruction

Stream B

Instruction Stream C

Source: Raj Kumar Buyya ppts.

Page 3: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

3

5 Parallel Computing (Intro-04): Rajeev Wankar

SIMD Architecture

Ex: CRAY machine vector processing, Thinking machine cm*

Instruction

Stream

Processor

A

Processor

B

Processor

C

Data Input

stream A

Data Input

stream B

Data Input

stream C

Data Output

stream A

Data Output

stream B

Data Output

stream C

Source: Raj Kumar Buyya ppts.

6 Parallel Computing (Intro-04): Rajeev Wankar

Unlike SISD and MISD a MIMD computer works asynchronously.

Shared memory (tightly coupled) MIMD

Distributed memory (loosely coupled) MIMD

MIMD Architecture

Processor

A

Processor

B

Processor

C

Data Input

stream A

Data Input

stream B

Data Input

stream C

Data Output

stream A

Data Output

stream B

Data Output

stream C

Instruction

Stream A Instruction

Stream B Instruction

Stream C

Source: Raj Kumar Buyya ppts.

Page 4: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

4

7 Parallel Computing (Intro-04): Rajeev Wankar

M

E

M

O

R

Y

B

U

S

Shared Memory MIMD machine

M

E

M

O

R

Y

B

U

S

Global Memory System

Processor

A

Processor

B

Processor

C

M

E

M

O

R

Y

B

U

S

Source: Raj Kumar Buyya ppts.

8 Parallel Computing (Intro-04): Rajeev Wankar

Shared Memory MIMD machine

Comm: Source PE writes data to GM & destination retrieves it

Easy to build, conventional OSes of SISD can easily be ported

Limitation: reliability & expandability. A memory component or any processor failure affects the whole system.

Increase of processors leads to memory contention.

Ex. : Silicon graphics supercomputers....

Page 5: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

5

9 Parallel Computing (Intro-04): Rajeev Wankar

M

E

M

O

R

Y

B

U

S

Distributed Memory MIMD

Processor

A

Processor

B

Processor

C

Memory

System A Memory

System B

Memory

System C

IPC

channel

IPC

channel

M

E

M

O

R

Y

B

U

S

M

E

M

O

R

Y

B

U

S

10 Parallel Computing (Intro-04): Rajeev Wankar

Distributed Memory MIMD

Communication : IPC on High Speed Network.

Network can be configured to ... Tree, Mesh,

Cube, etc.

Unlike Shared MIMD

easily/ readily expandable

Highly reliable (any CPU failure does not affect the whole system)

Page 6: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

6

12 Parallel Computing (Intro-04): Rajeev Wankar

Architectural Model of MIMD

Cache

Coherent

Cache

only

Source: Berry Wilkinson’s ppts.

13 Parallel Computing (Intro-04): Rajeev Wankar

Shared memory multiprocessor system

Any memory location can be accessible by any of the

processors.

A single address space exists, meaning that each memory

location is given a unique address within a single range of

addresses.

Generally, shared memory programming is more convenient

although it does require access to shared data to be

controlled by the programmer (using critical sections etc.)

Page 7: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

7

14 Parallel Computing (Intro-04): Rajeev Wankar

Several Alternatives for Programming Shared Memory

Multiprocessors:

• Using heavy weight processes.

• Using threads. Example Pthreads.

• Using a completely new programming language for parallel

programming Example Ada or newly designed Cilk++.

• Using library routines with an existing sequential programming

language.

• Modifying the syntax of an existing sequential programming

language to create a || programming language. Example UPC.

• Using an existing sequential programming language

supplemented with compiler directives for specifying parallelism.

Example OpenMP.

• Using Threading Building Block (TBB) from Intel.

15 Parallel Computing (Intro-04): Rajeev Wankar

Threads - shares the

same memory space

and global variables

between routines.

process - completely

separate program with

its own variables,

stack, and memory

allocation.

Differences between a process and threads

“lightweight” process is a kernel or some O.S. thread

Code

IP

Heap

Interrupt routines

Stack

Files

Code Heap

Interrupt routines

Files

Stack IP

thread

Stack IP

thread

Page 8: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

8

16 Parallel Computing (Intro-04): Rajeev Wankar

Shared Memory Systems and Programming

Topics:

• Regular shared memory systems and programming

• Distributed shared memory on a cluster

17 Parallel Computing (Intro-04): Rajeev Wankar

Pthreads

IEEE Portable Operating System Interface, POSIX, sec.

1003.1 standard

In Pthread, the main program is a thread itself, a separate

thread is created and terminated with the routine itself.

pthread_t thread1; /*handle of special Pthread data type*/

pthread_create(&thread1,NULL,(void *)proc1,(void *)&argv);

pthread_join(thread1, void *status);

A thread ID or handle is assigned and obtained from &thread

Page 9: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

9

18 Parallel Computing (Intro-04): Rajeev Wankar

Symmetric Multiprocessors (SMPs)

Uses commodity microprocessors with on-chip and off-chip

caches.

Processors are connected to a shared memory through a

high-speed snoopy bus

P/C : Microprocessor and cache; SM : Shared memory

19 Parallel Computing (Intro-04): Rajeev Wankar

On Some SMPs, a crossbar switch is used in addition to

the bus.

Scalable up to:

• 48-64 processors

Equal priority for all processors (except for master or

boot CPU)

Memory coherency maintained by HW

Multiple I/O Buses for greater Input Output

All processors see same image of all system resources

Page 10: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

10

20 Parallel Computing (Intro-04): Rajeev Wankar

Bus based architecture :

Inadequate beyond 8-16 processors

Crossbar based architecture

multistage approach considering I/Os required in

hardware

Limitation is mainly caused by using a centralized shared

memory and a bus or cross bar interconnect which are

both difficult to scale once built.

21 Parallel Computing (Intro-04): Rajeev Wankar

Message-Passing Multicomputer

MIMD Complete computers connected through an

interconnection network:

Interconnection Network Messages

Processor

Local

Memory

Page 11: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

11

22 Parallel Computing (Intro-04): Rajeev Wankar

Commercial Parallel Computing Hardware Models

Single Instruction Multiple Data (SIMD)

Parallel Vector Processor (PVP)

Symmetric Multiprocessor (SMP)

Distributed Shared Memory multiprocessors (DSM)

Massively Parallel Processor (MPP)

Cluster of Workstations (COW)

23 Parallel Computing (Intro-04): Rajeev Wankar

Cluster of Computers – Features

A cluster is a type of parallel or distributed processing system, which

consists of a collection of interconnected stand-alone computers

cooperatively working together as a single, integrated computing

resources. “stand-alone” (whole) computer that can be used on its

own (full hardware and OS)

Collection of nodes physically connected over commodity/ proprietary

network

Cluster computer is a collection of complete independent

workstations or Symmetric Multi Processors

Network is a decisive factors for scalability issues (especially for fine

grain applications)

High volumes driving high performance

Network using commodity components and proprietary architecture is

becoming the trend

Page 12: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

12

25 Parallel Computing (Intro-04): Rajeev Wankar

Cluster system architecture

Sequential Applications Sequential Applications

Sequential Applications

Parallel Applications

Parallel Applications

Parallel Applications

26 Parallel Computing (Intro-04): Rajeev Wankar

Amber and Venus in

Grid Computing Lab

Source: Internet

Page 13: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

13

27 Parallel Computing (Intro-04): Rajeev Wankar

Common Cluster Modes

• High Performance (dedicated)

• High Throughput (idle cycle collection)

• High Availability

28 Parallel Computing (Intro-04): Rajeev Wankar

High Performance Cluster (dedicated mode)

Source: Raj Buyya Slides

Page 14: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

14

29 Parallel Computing (Intro-04): Rajeev Wankar

Shared Pool of Computing Resources: Processors, Memory, Disks

Interconnect

Guarantee at least one workstation to many individuals (when active)

Deliver large % of collective resources to few individuals at any one time

High Throughput Cluster (Idle Resource Collection)

Source: Raj Buyya Slides

30 Parallel Computing (Intro-04): Rajeev Wankar

High Availability Clusters

Source: Raj Buyya Slides

Page 15: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

15

31 Parallel Computing (Intro-04): Rajeev Wankar

• Best of both Worlds:

world is heading towards

this configuration)

HA and HP in the same Cluster

Source: Raj Buyya Slides

32 Parallel Computing (Intro-04): Rajeev Wankar

Cluster Components

Page 16: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

16

33 Parallel Computing (Intro-04): Rajeev Wankar

Prominent Components of

Cluster Computers

• Multiple High Performance Computers

– PCs

– Workstations

– SMPs (CLUMPS)

– Distributed HPC Systems leading to Grid Computing

Source: Raj Buyya Slides

34 Parallel Computing (Intro-04): Rajeev Wankar

• State of the art Operating Systems

– Linux (MOSIX, Beowulf, and many more)

– Microsoft NT (Illinois HPVM, Cornell Velocity)

– SUN Solaris (Berkeley NOW, C-DAC PARAM)

– IBM AIX (IBM SP2)

– HP UX (Illinois - PANDA)

– Mach (Microkernel based OS) (CMU)

– Cluster Operating Systems (Solaris MC, SCO Unixware,

MOSIX (academic project)

– OS gluing layers (Berkeley Glunix)

Prominent Components of

Cluster Computers

Source: Raj Buyya Slides

Page 17: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

17

35 Parallel Computing (Intro-04): Rajeev Wankar

• High Performance Networks/Switches – Ethernet (10Mbps),

– Fast Ethernet (100Mbps),

– Gigabit Ethernet (1Gbps)

– SCI (Scalable Coherent Interface- MPI- 12µsec latency)

– ATM (Asynchronous Transfer Mode)

– Myrinet (1.2Gbps)

– QsNet (Quadrics Supercomputing World, 5µsec latency for MPI messages)

– Digital Memory Channel

– FDDI (Fiber Distributed Data Interface)

– InfiniBand

Prominent Components of

Cluster Computers

Source: Raj Buyya Slides

36 Parallel Computing (Intro-04): Rajeev Wankar

• Fast Communication Protocols and Services (User Level Communication):

– Active Messages (Berkeley)

– Fast Messages (Illinois)

– U-net (Cornell)

– XTP (Virginia)

– Virtual Interface Architecture (VIA)

Prominent Components of

Cluster Computers

Source: Raj Buyya Slides

Page 18: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

18

37 Parallel Computing (Intro-04): Rajeev Wankar

• Parallel Programming Environments and Tools

– Threads (PCs, SMPs, NOW..) • POSIX Threads

• Java Threads

– MPI (Message Passing Interface) • Linux, NT, on many Supercomputers

– PVM (Parallel Virtual Machine)

– Software DSMs (Shmem)

– Compilers • C/C++/Java

• Parallel programming with C++ (MIT Press book)

– RAD (rapid application development tools) • GUI based tools for PP modeling

– Debuggers

– Performance Analysis Tools (PAPI, Performance Application Programming Interface)

– Visualization Tools

Prominent Components of

Cluster Computers

Source: Raj Buyya Slides

38 Parallel Computing (Intro-04): Rajeev Wankar

Classification of Clusters

Page 19: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

19

39 Parallel Computing (Intro-04): Rajeev Wankar

Classification of Clusters

40 Parallel Computing (Intro-04): Rajeev Wankar

Users View of Cluster

The users view the entire cluster as Single system, which has multiple

processors. The user could say: “Execute my application using five

processors.” This is different from a distributed system.

Single Entry

Single File Hierarchy

Single Networking

Single Input/Output

Single Point of Control

Single Memory Space

Single Job Management System

Single User Interface

Single Process Space

Single System

Symmetry

Location Transparent

Page 20: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

20

41 Parallel Computing (Intro-04): Rajeev Wankar

Job Management

Global job management

Global system management and configuration

Group based scheduling and resource allocation

Idle resource detection

Co-scheduling of parallel programs

Load Sharing Facility (LSF)

High Availability

Fault tolerance

Check-pointing

42 Parallel Computing (Intro-04): Rajeev Wankar

A major Issues in Clusters Design

Enhanced Performance (performance @low cost)

Enhanced System Image (look-and-feel of one system)

Size Scalability (physical & application)

Fast Communication (networks & protocols)

Load Balancing (CPU, Net, Memory, Disk)

Security and Encryption (clusters of clusters)

Distributed Environment (Social issues)

Manageability (admin. And control)

Programmability (simple API if required)

Applicability (cluster-aware and non-aware app.)

Page 21: Processing Elements Architecturewankarcs/index_files/pdf/pc-02-02.pdf · HPVM PCs Myrinet with Fast Messages NT or Linux connection and global resource manager + LSF Java-fronted,

21

43 Parallel Computing (Intro-04): Rajeev Wankar

Some Cluster Systems: Comparison

Project Platform Communications OS Other

Beowulf PCs Multiple Ethernet with

TCP/IP

Linux and

Grendel

MPI/PVM.

Sockets and HPF

Berkeley Now Solaris-based

PCs and

workstations

Myrinet and Active

Messages

Solaris +

GLUnix + xFS

AM, PVM, MPI,

HPF, Split-C

HPVM PCs Myrinet with Fast

Messages

NT or Linux

connection and

global resource

manager + LSF

Java-fronted, FM,

Sockets, Global

Arrays, SHEMEM

and MPI

Solaris MC Solaris-based

PCs and

workstations

Solaris-supported Solaris +

Globalization

layer

C++ and CORBA

44 Parallel Computing (Intro-04): Rajeev Wankar

Major References

Kai Hwang, Zhiwei Xu, Scalable Parallel Computing (Technology

Architecture Programming) McGraw Hill Newyork (1997).

Culler David E, Jaswinder Pal Singh with Anoop Gupta, Parallel

Computer Architecture, A Hardware/Software Approach, Morgan

Kaufmann Publishers, Inc, (1999), Reprinted in 2004.

Barry Wilkinson And Michael Allen, Parallel Programming:

Techniques and Applications Using Networked Workstations and

Parallel Computers, Prentice Hall, Upper Saddle River, NJ, 1999.

RajKumar Buyya, High Performance Cluster Computing,

Programming and Applications, Prentice Hall, 1999.