introduction to parallel processing

21
Introduction to Parallel Processor Chinmay Terse Rahul Agarwal Vivek Ashokan Rahul Nair

Upload: rahul-nair

Post on 15-Jul-2015

170 views

Category:

Software


1 download

TRANSCRIPT

Introduction to Parallel

Processor

Chinmay Terse

Rahul Agarwal

Vivek Ashokan

Rahul Nair

What is Parallelism?

• Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds

• Parallel Processing was introduced because the sequential process of executing instructions took a lot of time

Classification Parallel Processor Architectures

SIMD MIMD

It is also called as Array Processor It is also called as Multiprocessor

Here, single stream of instruction is

fetched

Here, multiple streams of instruction

are fetched

The instruction stream is fetched by

shared memory

The instruction streams are fetched

by control unit

Here, instruction is broadcasted to

multiple processing elements(PEs)

Here, instruction streams are

decoded to get multiple decoded

instruction stream (IS)

Block Diagram of Tightly Coupled Multiprocessor

• Processors share memory

• Communicate via that shared memory

Block Diagram of Loosely Coupled Multiprocessor

Cluster Computer Architecture

Cluster v. SMP

• Both provide multiprocessor support to high demand applications.

• Both available commercially

• SMP:— Easier to manage and control— Closer to single processor systems

– Scheduling is main difference– Less physical space– Lower power consumption

• Clustering:— Superior incremental & absolute scalability— Less cost— Superior availability

– Redundancy

Loosely Coupled Multiprocessor Tightly Coupled Multiprocessor

Communication takes places

through Message Transfer System

Communication takes places

through PMIN.

The processors do not share

memory

The processors shares memory

Each processor has its own local

memory

Each processor does have its own

local memory

They are referred to as Distributed

Systems

They are referred to as Shared

Systems

Interaction between the processors

and IO modules is very low

Interaction between the processors

and IO modules is very high

The processor is directly connected

to IO devices.

The processor is connected to IO

devices through IOPIN

There are no unmapped local

memory

Every processor has unmapped

local memory

Communication Models• Shared Memory

– Processors communicate with shared address space

– Easy on small-scale machines

– Advantages:

• Model of choice for uniprocessors, small-scale MPs

• Ease of programming

• Lower latency

• Easier to use hardware controlled caching

• Message passing

– Processors have private memories, communicate via messages

– Advantages:

• Less hardware, easier to design

• Focuses attention on costly non-local operations

• Can support either SW model on either HW base

Shared Address/Memory Model• Communicate via Load and Store

– Oldest and most popular model

• Based on timesharing: processes on multiple processors vs. sharing single processor

• Single virtual and physical address space

– Multiple processes can overlap (share), but ALL threads share a process address space

• Writes to shared address space by one thread are visible to reads of other threads

– Usual model: share code, private stack, some shared heap, some private heap

Advantages shared-memory

communication model• Compatibility with SMP hardware

• Ease of programming when communication patterns are complex or vary dynamically during execution

• Ability to develop apps using familiar SMP model, attention only on performance critical accesses

• Lower communication overhead, better use of BW for small items, due to implicit communication and memory mapping to implement protection in hardware, rather than through I/O system

• HW-controlled caching to reduce remote comm. by caching of all data, both shared and private.

Shared Address Model Summary

• Each process can name all data it shares with other processes

• Data transfer via load and store

• Data size: byte, word, ... or cache blocks

• Uses virtual memory to map virtual to local or remote physical

• Memory hierarchy model applies: now communication moves data to local proc. cache (as load moves data from memory to cache)

– Latency, BW (cache block?), scalability when communicate?

Message Passing Model

• Whole computers (CPU, memory, I/O devices) communicate as explicit I/O

operations

• Send specifies local buffer + receiving process on remote computer

• Receive specifies sending process on remote computer + local buffer to place data

– Usually send includes process tag

and receive has rule on tag: match 1, match any

– Synch: when send completes, when buffer free, when request accepted, receive

wait for send

• Send+receive => memory-memory copy, where each each supplies local address,

AND does pair-wise synchronization!

Message Passing Model

• Send + receive => memory-memory copy, synchronization on OS even on 1 processor

• History of message passing:

– Network topology important because could only send to immediate neighbour

– Typically synchronous, blocking send & receive

– Later DMA with non-blocking sends, DMA for receive into buffer until processor does receive, and then data is transferred to local memory

– Later SW libraries to allow arbitrary communication

• Example: IBM SP-2, RS6000 workstations in racks.

Advantages message-passing

communication model

• The hardware can be simpler

• Communication explicit => simpler to understand; in shared memory it can be hard to know when communicating and when not, and how costly it is

• Explicit communication focuses attention on costly aspect of parallel computation, sometimes leading to improved structure in multiprocessor program

• Synchronization is naturally associated with sending messages, reducing the possibility for errors introduced by incorrect synchronization

• Easier to use sender-initiated communication, which may have some advantages in performance

Applications of Parallel Processing

Summary

■ Serial computers / microprocessors will probably not get much faster - parallelization unavoidable

❑ Pipelining, cache and other optimization strategies for serial computers reaching a plateau

❑ the heat wall has also been reached

■ Application examples

■ Data and functional parallelism

■ Flynn’s taxonomy: SIMD, MISD, MIMD/SPMD

■ Parallel Architectures Intro

❑ Distributed Memory message-passing

❑ Shared Memory

■ Uniform Memory Access

■ Non Uniform Memory Access (distributed shared memory)

■ Parallel program/algorithm examples