3/12/2013computer engg, iit(bhu)1 concepts-1. pipelining pipelining is used to increase the speed of...
DESCRIPTION
Pipelining Often output of one stage becomes the input to the second stage When all stages work on same speed and pipe is full, the work rate of pipelines equals to the sum of the work rates of the stagesTRANSCRIPT
3/12/2013 Computer Engg, IIT(BHU) 1
CONCEPTS-1
Pipelining
•Pipelining is used to increase the speed of processing
•It uses temporal parallelism
•In pipelining, a computation is divided into number of steps called stages
•Each stage works at full speed on a particular part of a computation
Pipelining
•Often output of one stage becomes the input to the second stage
•When all stages work on same speed and pipe is full, the work rate of pipelines equals to the sum of the work rates of the stages
An Example
•There are 3 stages: A, B and C
•In the time slot T3, all the stages can be performed simultaneously as they run on different parts
•Above situations lead to challenged foe pipelining to work
Instruction Pipeline
•F: Fetch Instruction
•D: Decode Instruction
•Ex: Execute Instruction
•W: Write to the memory
Non-Ideal Situations
•It is not possible to breakup an instruction execution into stages taking same time
•Successive instructions are not always independent
•There may be resource constraints due to limited size of chip
Making a Parallel Program
Important Concepts:
•Task: A piece of work of a parallel program that can not be decomposed
•Process: An abstract entity and performs the task assigned to processors
-We first write a program in terms of processes then map to the processors
Making a Parallel Program (Contd.)
Group of tasks are mapped to different processors
An Example:
•Merge Sort using two Processors:
Mapping
•Decides which process will run which processor
•Takes care of the processor architecture
•Also considers the load balancing, fault tolerance issues
Dependency Analysis
•For decomposing an application into tasks, we look for dependency among statements in the sequential program of the application
•There mainly two types of dependency:
-Data dependency
-Control dependency
Super Pipelining
•In pipelining we assume that each stage takes one clock cycle but this is in ideal condition
•In practice some pipeline stages take less time than one clock time
•So we can divide each clock cycle into two phases and allocate intervals
•Pipelining becomes faster if the two phases need different resources( to avoid the resource conflict)
- thus the notion of superpipelining
Superscalar Processing
•In superscalar processing , data parallelism and temporal parallelism are combined to increase the speed of the processor
•This is achieved by issuing more than one instruction at same time in each clock cycle
•Hardware should be able to fetch several instructions at a time
Superscalar Processing
Vector Processors
•Specially designed to perform vector operations
•A vector operation involves a large array of operands i.e. some operation is performed over different data
•Excellent compilers for vector code written in a programming language are available
Register Based Vector Operation (An Example)•Let * be a vector operator and V1 and V2 are the vector operands stored in registers
V3 ← V1 * V2
•V3 may be a vector or scalar
•The vector length should be equal in all operands
Array Processors
•A vector processor works by streaming the vectors through a pipelined unit
•Another architecture to perform vector operation is to use an array of n Processing Elements (PE)
•Each PE stores a pair of operands
•The operation is broadcast to all PEs simultaneously
•Such organization of PEs is called array processor
VLIW Architecture
•In superscalar processing, it is quite difficult to duplicate instruction register, decoder and arithmetic unit
•VLIW (Very Long Instruction Word) processor has instruction words hundreds of bits in length
•Multiple functional units are used simultaneously in a VLIW processor
•The units are Integer unit, FP unit, Branch unit, Load/Store unit etc.
•The objective is to keep all these units busy
A Typical VLIW Instruction Format
Major Challenges in Designing VLIW Processors•Lack of sufficient instruction level parallelism
•Hardware complexity (needs high memory and register bandwidth)
•Inefficient use of bits in a very long instruction word