2016/1/5part i1 models of parallel processing. 2016/1/5part i2 parallel processors come in many...

112/04/21 Part I 1

Models of Parallel Processing

112/04/21 Part I 2

• Parallel processors come in many different varieties.

• Thus, we often deal with abstract models of real machines.

112/04/21 Part I 3

Development of Early Models (1)

• Associative processing (AP) was perhaps the earliest form of parallel processing. – Associative or content-addressable memories (AMs, CAMs),

which allow memory cells to be accessed based on contents rather than their physical locations within the memory array.

– AMI AP architectures are essentially based on incorporating simple processing logic into the memory array so as to remove the need for transferring large volumes of data through the limited-bandwidth interface between the memory and the processor (the von Neumann bottleneck)

112/04/21 Part I 4


• the AM/AP model has evolved through the incorporation of additional capabilities, so that it is in essence converging with SIMD-type array processors.

112/04/21 Part I 5


• neural networks

• Cellular automata

112/04/21 Part I 6

112/04/21 Part I 7

112/04/21 Part I 8

SIMD Vs. MIMD (1)

• Most early parallel machines had SIMD designs.

• Within the SIMD category, two fundamental design choices exist: – Synchronous versus asynchronous SIMD

• A possible cure is to use the asynchronous version of SIMD, known as SPMD

– Custom- versus commodity-chip SIMD

112/04/21 Part I 9

SIMD Vs. MIMD (2)

• In the 1990s, the MIMD paradigm has become more popular recently.

• MIMD machines are most effective for medium- to coarse-grain parallel applications, where the computation is divided into relatively large subcomputations or tasks whose executions are assigned to the various processors.

112/04/21 Part I 10

SIMD Vs. MIMD (3)

• Within the MIMD class, three fundamental issues or design choices are subjects of ongoing debates in the research community. – MPP-massively or moderately parallel processor

• Is it more cost-effective to build a parallel processor out of a relatively small number of powerful processors or a massive number of very simple processors

– Tightly versus loosely coupled MIMD• network of workstations (NOW), cluster computing, Grid

Computing

– Explicit message passing versus virtual shared memory

112/04/21 Part I 11

Global Vs. Distributed Memory (1)

• Within the MIMD class of paranel processors, memory can be global or distributed.

• Global memory may be visualized as being in a central location where all processors can access it with equal ease.

• memory latency-hiding techniques must be employed. An example of such methods is the use of multithreading.

112/04/21 Part I 12

112/04/21 Part I 13


• Examples for both the processor-to-memory and processor-to-processor networks include:

• an abstract model of global-memory computers, known as PRAM.

• One approach to reducing the amount of data that must pass through the processor-to memory interconnection network is to use a private cache memory. (locality of data access, cache coherence problem)

112/04/21 Part I 14

112/04/21 Part I 15


• Distributed-memory architectures can be conceptually viewed as in Fig. 4.5.

• In addition to the types of interconnection networks enumerated for shared-memory parallel processors, distributed-memory MIMD architectures can also be interconnected by a variety of direct networks. (as nonuniform memory access (NUMA) architectures)

112/04/21 Part I 16

112/04/21 Part I 17

PRAM Shared-Memory Model (1)

• The theoretical model used for conventional or sequential computers (SISD class) is known as the random-access machine (RAM)

• The parallel version of RAM (PRAM), constitutes an abstract model of the class of global-memory parallel processors. The abstraction consists of ignoring the details of the processor-to-memory interconnection network and taking the view that each processor can access any memory location in each machine cycle, independent of what other processors are doing.

112/04/21 Part I 18

112/04/21 Part I 19


• In the formal PRAM model, a single processor is assumed to be active initially. In each computation step, each active processor can read from and write into the shared memory and can also activate another processor.

• Even though the global-memory architecture was introduced as a subclass of the MIMD class, the abstract PRAM model depicted in Fig. 4.6 can be SIMD or MIMD.

112/04/21 Part I 20

112/04/21 Part I 21


• This implies that each instruction cycle would have to consume Ω(log p) real time.

• The above point is important when we try to compare PRAM algorithms with those for distributed-memory models. An O(log p)-step PRAM algorithm may not be faster than an O(1og2 p)-step algorithm for a hypercube architecture.

112/04/21 Part I 22

Distributed-Memory or Graph Models (1)

• Given the internal processor and memory structures in each node, a distributed-memory architecture is characterized primarily by the network used to interconnect the nodes.

• This network is usually represented as a graph.

• Important parameters of an interconnec tion network include– Network diameter: the longest of the shortest paths between various pairs

of nodes – Bisection (band)width: the smallest number (total capacity) of links that

need to be cut in order to divide the network into two subnetworks of half the size.

– Vertex or node degree: the number of communication ports required of each node

112/04/21 Part I 23

112/04/21 Part I 24

112/04/21 Part I 25

Distributed-Memory or Graph Models (2)

• Even though the distributed-memory architecture was introduced as a subclass of the MIMD class, machines based on networks of the type shown in Fig. 4.8 can be SIMD- or MIMD-type.

• Fig. 4.9 are available for reducing bus traffic by taking advantage of the locality of communication within small clusters of processors.

112/04/21 Part I 26

2016/1/5part i1 models of parallel processing. 2016/1/5part i2 parallel processors come in many...

Documents

distributed memory

global memory

memory array

memory mimd architectures

memory computers

memory cells

virtual shared memory

private cache memory