parallel programming models, languages and compilers
DESCRIPTION
Parallel Programming Models, Languages and Compilers . Module 6. Points to be covered. Parallel Programming Models - Shared-Variable Model, Message-Passing Model, Data-Parallel Model, Object Oriented Model, Functional and Logic Models Parallel Languages and Role of Compilers- - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/1.jpg)
Parallel Programming Models, Languages and Compilers
Module 6
![Page 2: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/2.jpg)
Points to be covered• Parallel Programming Models-
Shared-Variable Model, Message-Passing Model, Data-Parallel Model, Object Oriented Model, Functional and Logic Models
• Parallel Languages and Role of Compilers-Language Features for Parallelism, Parallel Language Constructs, Optimizing Compilers for Parallelism
• Code Optimization and Scheduling- Scalar Optimization with Basic Blocks, Local and Global Optimizations, Vectorization and Parallelization Methods, Code Generation and Scheduling, Trace Scheduling Compilation
![Page 3: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/3.jpg)
Parallel Programming Model
• Programming model->simplified and transparent view of computer hardware/software system.
• Parallel Programming Model are specifically designed for multiprocessors, multicomputer or vector/SIMD computers.
![Page 4: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/4.jpg)
Classification
• We have 5 programming models-:Shared-Variable ModelMessage-Passing ModelData-Parallel ModelObject Oriented ModelFunctional and Logic Models
![Page 5: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/5.jpg)
Shared Variable Model
• In all programming system, processors are active resources and memory & IO devices are passive resources.
• Program is a collection of processes.• Parallelism depends on how IPC( Interprocess
Communication) is implemented.• Process address space is shared.• To ensure orderly IPC ,a mutual exclusion property
requires that shared object must be shared by only 1 process at a time.
![Page 6: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/6.jpg)
Shared Variable Model(Subpoints)
• Shared Variable communication• Critical section• Protected access• Partitioning and replication• Scheduling and synchronization• Cache coherance problem
![Page 7: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/7.jpg)
Shared Variable communication
• Used in multiprocessor programming• Shared variable IPC demands use of shared
memory and mutual exclusion among multiple processes accessing the same set of variables.
Shared Variable in common
memory
Process A
Process C
Process B
![Page 8: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/8.jpg)
Critical Section
• Critical Section(CS) is a code segment accessing shared variable, which must be executed by only one process at a time and which once started must be completed without interruption.
![Page 9: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/9.jpg)
Critical Section Requirements
• It should satisfy following requirements-: Mutual ExclusionAt most one process executing CS at a time.No deadlock in waitingNo circular wait by 2 or more process.No preemptionNo interrupt until completion.Eventual EntryOnce entered CS,must be out after completion.
![Page 10: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/10.jpg)
Protected Access
• Granularity of CS affects the performance.• If CS is too large,it may limit parallism due to
excessive waiting by process.• When CS is too small,it may add unnecessary
code complexity/Software overhead.
![Page 11: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/11.jpg)
4 operational Modes
• Multiprogramming• Multiprocessing• Multitasking• Multithreading
![Page 12: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/12.jpg)
Multiprogramming
• Multiple independent programs running on single processor/multiprocessor by time sharing use of system resource.
• When program enters the I/O mode, the processor switches to another program.
![Page 13: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/13.jpg)
Multiprocessing
• When multiprogramming is implemented at the process level on a multiprocessor, it is called multiprocessing.
• 2 types of multiprocessing-: If interprocessor communication are handled at the
instruction level, the multiprocessor operates in MIMD mode.
If interprocessor communication are handled at the program,subroutine or procedure level, the multiprocessor operates in MPMD mode.
![Page 14: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/14.jpg)
Multitasking
• A single program can be partitioned into multiple interrelated tasks concurrently executed on a multiprocessor.
• Thus multitasking provides the parallel execution of 2 or more parts of single program.
![Page 15: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/15.jpg)
Multithreading
• The traditional UNIX/OS has a single threaded kernal in which 1 process can receive OS kernal service at a time.
• In multiprocessor we extend single kernal to be multithreaded.
• The purpose is to allow multiple threads of light weight processes to share same address space.
![Page 16: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/16.jpg)
Partitioning and Replication
• Goal of parallel processing is to exploit parallelism as much as possible with lowest overhead.
• Program partitioning is a technique for decomposing a large program and data set into many small pieces for parallel execution by multiple processors.
• Program partitioning involves both programmers and compilers.
![Page 17: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/17.jpg)
Partitioning and Replication
• Program replication refers to duplication of same program code for parallel execution on multiple processors over different data sets.
![Page 18: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/18.jpg)
Scheduling and Synchronization
• Scheduling further classified-:Static Scheduling• It is conducted at post compile time.• Its advantage is low overhead but shortcomings is a possible
mismatch with run time profile of each task.Dynamic Scheduling• Catches the run time conditions.• Requires fast context switching,premption and much more
OS support.• Advantage include better resource utilization at expense of
highest scheduling overhead.
![Page 19: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/19.jpg)
Cache Coherence & Protection
• Multicache coherance problem demands an invalidation or update after each write operation.
![Page 20: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/20.jpg)
Message Passing Model
• Two processes D and E residing at different processor nodes may communicate wit each other by passing messages through a direct network.
• The messages may be instructions, data,synchronization or interrupt signals etc.
• Multicomputers are considered loosely coupled multiprocessors.
![Page 21: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/21.jpg)
IPC using Message Passing
Message(Send/Recieve)Process D Process E
![Page 22: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/22.jpg)
Synchronous Message Passing
• No shared Memory• No mutual Exclusion• Synchronization of sender and reciever
process just like telephone call.• No buffer used.• If one process is ready to cummunicate and
other is not,the one that is ready must be blocked.
![Page 23: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/23.jpg)
Asynchronous Message Passing
• Does not require that message sending and receiving be synchronised in time and space.
• Arbitrary communication delay may be experienced because sender may not know if and when the message has been received until acknowledgement is received from receiver.
• This scheme is like a postal service using mailbox with no synchronization between senders and recievers.
![Page 24: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/24.jpg)
Data Parallel Model
• Used in SIMD computers• Parallelism handled by hardware
synchronization and flow control.• Fortran 90 ->data parallel lang.• Require predistrubuted data sets.
![Page 25: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/25.jpg)
Data Parallelism
• This technique used in array processors(SIMD)• Issue->match problem size with machine size.
![Page 26: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/26.jpg)
Array Language Extensions
• Various data parallel language used• Represented by high level data types• CFD for Illiac 4,DAP fortran for Distributed
array processor,C* for Connection machine• Target to make the number of PE’s of problem
size.
![Page 27: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/27.jpg)
Object Oriented Model
• Objects dynamically created and manipulated.• Processing is performed by sending and
receiving messages among objects.
![Page 28: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/28.jpg)
Concurrent OOP
• Need of OOP because of abstraction and reusability concept.
• Objects are program entities which encapsulate data and operations in single unit.
• Concurrent manipulation of objects in OOP.
![Page 29: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/29.jpg)
Actor Model
• This is a framework for Concurrent OOP.• Actors->independent component• Communicate via asynchronous message
passing.• 3 primitives->create,send to and become.
![Page 30: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/30.jpg)
Parallelism in COOP
• 3 common patterns for parallism-:1)Pipeline concurrency2)Divide and conquer3)Cooperative Problem Solving
![Page 31: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/31.jpg)
Functional and logic Model
• Functional Programming Language->Lisp,Sisal and Strand 88.Logic Programming Language->Concurrent Prolog and Parlog
![Page 32: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/32.jpg)
Functional Programming Model
• Should not produce any side effects.• No concept of storage,assignment and
branching.• Single assignment and data flow language
functional in nature.
![Page 33: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/33.jpg)
Logic Programming Models
• Used for knowledge processing from large database.
• Supports implicitly search strategy.• And parallel execution and Or Parallel
Reduction technique used.• Used in artificial intelligence
![Page 34: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/34.jpg)
Parallel Language and Compilers
• Programming environment is collection of s/w tools and system support.
• Parallel Software Programming environment needed.
• Users still forced to focus on hardware details rather than parallelism using high level abstraction.
![Page 35: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/35.jpg)
Language Features For Parallelism
• Optimization Features• Availability Features• Synchronization/communication Features• Control Of Parallelism• Data Parallelism Features• Process Management Features
![Page 36: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/36.jpg)
Optimization Features
• Theme->Conversion of sequential Program to Parallel Program.
• The purpose is to match s/w parallelism with hardware parallelism.
![Page 37: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/37.jpg)
• Software in Practice-:1)Automated Parallelizer Express C automated parallelizer and Allaint
FX Fortran compiler.2)Semiautomated ParallizerNeeds compiler directives or programmers
interaction.
![Page 38: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/38.jpg)
Availability Features
• Theme-:Enhance user friendliness, make language portable for large no of parallel computers and expand the applicability of software libraries.
![Page 39: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/39.jpg)
1)ScalabilityLanguage should be scalable to number of
processors and independent of hardware topology.
2)CompatibilityCompatible with sequential language.3)PortabilityLanguage should be portable to shared memory
multiprocessor, message passing or both.
![Page 40: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/40.jpg)
Synchronization/Communication Features
• Shared Variable (locks) for IPC• Remote Procedure Calls• Data Flow languages• Mailbox,Semaphores,Monitors
![Page 41: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/41.jpg)
Control Of Parallelism
• Coarse,Medium and fine grain• Explicit vs implicit parallelism• Global Parallelism• Loop Parallelism• Task Parallelism• Divide and Conquer Parallelism
![Page 42: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/42.jpg)
Data Parallelism Features
Theme-:how data is accessed and distributed in either SIMD and MIMD computers.
1)Runtime automatic decompositionData automatically distributed with no user
interaction.2)Mapping SpecificationUser specifies patterns and input data mapped
to hardware.
![Page 43: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/43.jpg)
• Virtual Processor SupportCompilers made statically and maps to physical
processor.• Direct Access to shared dataShared data is directly accessed by operating
system.
![Page 44: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/44.jpg)
Process Management Features
Theme-:Support efficient creation of parallel
process,multithreading/multitasking,program partitioning and replication and dynamic load balancing at run time.
![Page 45: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/45.jpg)
1)Dynamic Process Creation at Run Time.2)Creation of lightweight processes.3)Replication technique.4)Partitioned Networks.5)Automatic Load Balancing
![Page 46: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/46.jpg)
Optimizing Compilers for Parallelism
• Role of compiler to remove burden of optimization and generation.
3 Phases-:1)Flow analysis2)Optimization3)Code Generation
![Page 47: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/47.jpg)
Flow Analysis
• Reveals design flow patters to determine data and control dependencies.
• Flow analysis carried at various execution levels.
1)Instruction level->VLSI or superscaler processors.
2)Loop level->Simd and systolic computer3)Task level->Multiprocessor/Multicomputer
![Page 48: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/48.jpg)
Program Optimization
• Transformation of user program to explore hardware capability.
• Explores better performance.• Goal to maximise speed of code execution.• To minimize code length.• Local and global optimizations. • Machine dependent Transformation
![Page 49: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/49.jpg)
Parallel Code Generation
• Compiler directive can be used to generate parallel code.
• 2 optimizing compilers-:1)Parafase and Parafase 22)PFC and Parascope
![Page 50: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/50.jpg)
Parafase and Parafase2
• Transforms sequential programs of fortran 77 into parallel programs.
• Parafase consists of 100 program that are encoded and passed.
• Pass list indentifies dependencies and converts it to concurrent program.
• Parafase2 for c and pascal in extension to fortran.
![Page 51: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/51.jpg)
PFC AND Parascope
• Translates fortran 77 to fortran 90 code.• PFC package extended to PFC + for parallel
code generation on shared memory multiprocessor.
• PFC performs analysis as following steps below-:
![Page 52: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/52.jpg)
• PFC performs analysis as following steps below-:
1)Interprocedure Flow analysis2)Transformation3)dependence analysis4)Vector Code Generation
![Page 53: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/53.jpg)
Code Optimization and scheduling
• Compiler is a software that translates the source code to object code.
• Optimization demands efforts from programmer and compiler.
![Page 54: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/54.jpg)
Code Optimization & Schedulin(Sub Points)
1) Scaler optimization within basic blocks• Basic Block Scheduling
2)Local & Global OptimizationLocal-:• Common Subexpression Elimination• Constant Propagation/Folding• Algebraic Optimization to simplify expressions• Instruction Reordering• Dead Code Elimination
![Page 55: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/55.jpg)
Global-:Global Version of local OptimizationLoop OptimizationControl Flow Optimization
![Page 56: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/56.jpg)
Scaler optimization within basic blocks
• 2 types of scheduling-:Static Scheduling (compiler determines dependency
analysis)Dynamic Scheduling(Hardware of operating system
determines dependency analysis at run time)• Code scheduling methods ensure that control
dependency,data dependency and resource dependency are properly handled during concurrent execution.
![Page 57: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/57.jpg)
Basic Blocks
• Basic blocks is a sequence of statements, with the properties that
(a) The flow of control can only enter the basic block through the first instruction in the block. That is, there are no jumps into the middle of the block.
(b) Control will leave the block without halting or branching, except possibly at the last instruction in the block.
![Page 58: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/58.jpg)
Partitioning instructions into basic blocks
• INPUT: A sequence of three-address instructions.
• OUTPUT: A list of the basic blocks for that sequence in which each instruction is assigned to exactly one basic block
• METHOD: First, we determine those instructions in the intermediate code that are leaders, that is, the first instructions in some basic block.
![Page 59: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/59.jpg)
• The rules for finding leaders are:1. The first three-address instruction in the
intermediate code is a leader.2. Any instruction that is the target of a
conditional or unconditional jump is a leader.3. Any instruction that immediately follows a
conditional or unconditional jump is a leader.
![Page 60: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/60.jpg)
Example
sum = 0 do 10 i = 1, n
10 sum = sum + a[i]*a[i]
![Page 61: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/61.jpg)
Three Address Code1. sum = 0 initialize sum2. i = 1 initialize loop counter3. if i > n goto 15 loop test, check for limit4. t1 = addr(a) – 45. t2 = i * 4 a[i]6. t3 = t1[t2]7. t4 = addr(a) – 48. t5 = i * 4 a[i]9. t6 = t4[t5]10. t7 = t3 * t6 a[i]*a[i]11. t8 = sum + t712. sum = t8 increment sum13. i = i + 1 increment loop counter14. goto 3
15. …
![Page 62: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/62.jpg)
Control Flow Graph (CFG)1. sum = 02. i = 1
3. if i > n goto 15
4. t1 = addr(a) – 45. t2 = i*4 6. t3 = t1[t2]7. t4 = addr(a) – 48. t5 = i*49. t6 = t4[t5]10. t7 = t3*t611. t8 = sum + t712. sum = t813. i = i + 114. goto 3
15. …T
F
![Page 63: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/63.jpg)
Common Subexpression Elimination1. sum = 0 1. sum = 0 2. i = 1 2. i = 13. if i > n goto 15 3. if i > n goto 154. t1 = addr(a) – 4 4. t1 = addr(a) – 45. t2 = i*4 5. t2 = i*46. t3 = t1[t2] 6. t3 = t1[t2]7. t4 = addr(a) – 4 7. t4 = addr(a) – 48. t5 = i*4 8. t5 = i*49. t6 = t4[t5] 9. t6 = t4[t5]10. t7 = t3*t6 10. t7 = t3*t611. t8 = sum + t7 10a t7 = t3*t312. sum = t8 11. t8 = sum + t713. i = i + 1 12. sum = t814. goto 3 13. i = i + 115. … 14. goto 3
![Page 64: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/64.jpg)
Copy Propagation1. sum = 0 1. sum = 0 2. i = 1 2. i = 13. if i > n goto 15 3. if i > n goto 154. t1 = addr(a) – 4 4. t1 = addr(a) - 45. t2 = i * 4 5. t2 = i * 46. t3 = t1[t2] 6. t3 = t1[t2]10a t7 = t3 * t3 10a t7 = t3 * t311 t8 = sum + t7 11. t8 = sum + t712. sum = t8 11a sum = sum + t713. i = i + 1 12. sum = t814. goto 3 13. i = i + 115. … 14. goto 3
15. …
![Page 65: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/65.jpg)
Invariant Code Motion1. sum = 0 1. sum = 0 2. i = 1 2. i = 13. if i > n goto 15 2a t1 = addr(a) - 44. t1 = addr(a) – 4 3. if i > n goto 155. t2 = i * 4 4. t1 = addr(a) - 46. t3 = t1[t2] 5. t2 = i * 410a t7 = t3 * t3 6. t3 = t1[t2]11a sum = sum + t7 10a t7 = t3 * t313. i = i + 1 11a sum = sum + t714. goto 3 13. i = i + 115. … 14. goto 3
15. …
![Page 66: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/66.jpg)
Strength Reduction1. sum = 0 1. sum = 02. i = 1 2. i = 12a t1 = addr(a) – 4 2a t1 = addr(a) - 43. if i > n goto 15 2b t2 = i * 45. t2 = i * 4 3. if i > n goto 156. t3 = t1[t2] 5. t2 = i * 410a t7 = t3 * t3 6. t3 = t1[t2]11a sum = sum + t7 10a t7 = t3 * t313. i = i + 1 11a sum = sum + t714. goto 3 11b t2 = t2 + 415. … 13. i = i + 1
14. goto 3
15. …
![Page 67: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/67.jpg)
Constant Propagation and Dead Code Elimination
1. sum = 0 1. sum = 02. i = 1 2. i = 1 2a t1 = addr(a) – 4 2a t1 = addr(a) - 4 2b t2 = i * 4 2b t2 = i * 42c t9 = n * 4 2c t9 = n * 43a if t2 > t9 goto 15 2d t2 = 46. t3 = t1[t2] 3a if t2 > t9 goto 1510a t7 = t3 * t3 6. t3 = t1[t2]11a sum = sum + t7 10a t7 = t3 * t311b t2 = t2 + 4 11a sum = sum + t714. goto 3a 11b t2 = t2 + 415. … 14. goto 3a
15. …
![Page 68: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/68.jpg)
New Control Flow Graph1. sum = 02. t1 = addr(a) - 4 3. t9 = n * 44. t2 = 4
5. if t2 > t9 goto 11
6. t3 = t1[t2]7. t7 = t3 * t38. sum = sum + t79. t2 = t2 + 410. goto 5
11. …
F
T
![Page 69: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/69.jpg)
Analysis and Optimizing Transformations
• Local optimizations – performed by local analysis of a basic block
• Global optimizations – requires analysis of statements outside a basic block
• Local optimizations are performed first, followed by global optimizations
![Page 70: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/70.jpg)
Local Optimizations --- Optimizing Transformations of a Basic Blocks
• Local common subexpression elimination• Dead code elimination• Copy propagation• Renaming of compiler-generated
temporaries to share storage
![Page 71: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/71.jpg)
Vectorization and Parallization
• Vectorization->Process of converting scalar looping operations into equivalent vector instructions.
• Parallelization->Converting sequential code to parallel code.
• Vectorizing Compiler->Compiler that does vectorization automatically.
• Vector hardware->Speed up Vector Operations.• Multiprocessors->used to speed up parallel codes.
![Page 72: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/72.jpg)
Vectorization Methods
• Use of temporary storage• Loop Interchange• Loop Distribution• Vector Reduction• Node Splitting
![Page 73: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/73.jpg)
Vectorization Example-:
Consider scalar loop for addition-:Do 20 I=8,120,220 A(I)=B(I+3)+C(I+1)After Vectorization-:A(8:120:2)=B(11:123:2)+C(9:121:2)
![Page 74: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/74.jpg)
Use of temporary Storage
• Theme->In order to have pipelined execution, use temporary array to produce vector code.
Do 20 I=1,N A(1)=B(1)+C(1)A(I)=B(I)+C(I) B(1)=2*A(2)
20 B(I)=2*A(I+1) A(2)=B(2)+C(2) B(2)=2*A(3) …….
![Page 75: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/75.jpg)
By use of temperory storage method we have-:TEMP(1:N)=A(1:N+1)A(1:N)=B(1:N)+C(1:N)B(1:N)=2*TEMP(1:N)
![Page 76: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/76.jpg)
Loop Interchange
• Theme->Exchange inner loop with outer loop and increase profitability(improvement in execution time)
Example-:Do 20 I=2,N
Do 10 J=2,NS1: A(I,J)=(A(I,J-1)+A(I,J+1))/210 Continue20 Continue
![Page 77: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/77.jpg)
After Loop Interchange
Do 20 J=2,NDo 20 I=2,N
S1: A(I,J)=(A(I,J-1)+A(I,J+1))/220 Continue
Now inner loop can be vectorized as follows-:Do 20 J=2,N
A(2:N,J) =(A(2:N,J-1)+A(2:N,J+1))/220 Continue
![Page 78: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/78.jpg)
Loop Distribution• Theme:Distribute the loop and vectorize it.
Example-:DO I = 1, N
S1A(I+1) = B(I) + C
S2 D(I) = A(I) + E
ENDDO
![Page 79: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/79.jpg)
• transformed to:
DO I = 1, NS1 A(I+1) = B(I) + CENDDO
DO I = 1, NS2 D(I) = A(I) + EENDDO
• leads to:
S1 A(2:N+1) = B(1:N) + CS2 D(1:N) = A(1:N) + E
![Page 80: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/80.jpg)
Vector Reduction• Theme->Produces the scaler value from 1 or more data
arrays.Example-:sum,product,maximum,minimum of all elements
in a single array.Example-:DO 40 I=1,NA(I)=b(I)+C(I)S=S+A(I)amax==max(amax,A(I))Continue
![Page 81: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/81.jpg)
After vector Reduction
A(I)=B(I)+C(I)S=S+SUM(A(1:N))AMAX=MAX(AMAX,MAXVAL(A(1:N))
Where sum,max and maxval are all vector operations.
![Page 82: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/82.jpg)
Node Splitting
• The data dependence cycle can some times be broken by node splitting.
Example:-Do 50 I=2,N Do 50 I=2,NS1: T(I)=A(I-1)+A(I+1) S1a: X(I)=A(I+1)S2: A(I)=B(I)+C(I) S2: A(I)=B(I)+C(I)
50 Continue S1b:T(I)=A(I-1)+X(I) 50 Continue
![Page 83: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/83.jpg)
Thus new loop structure can be vectorized as follows-:
S1a: X(2:N)=A(3:N+1)S2: A(2:N)=B(2:N)+C(2:N)S1b: T(2:N)=A(1:N-1) +X(2:N)
![Page 84: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/84.jpg)
Code Generation and Scheduling
• Directed Acyclic Graph(DAG)• Register Allocation• List Scheduling• Cycle Scheduling• Trace Scheduling Compilation
![Page 85: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/85.jpg)
DAG
• Constructed from basic block• Basic block->no backtracking• So we construct DAG
![Page 86: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/86.jpg)
Example of Constructing the DAG
(1) t1:= 4 * i Step (1): create node 4 and i0
Step (2): create node
Step (3): attach identifier t1
(2) t2:= a[t1] Step (1): create nodes labeled [], aStep (2): find previously node(t1)Step (3): attach label
(3) t3:= 4 * iHere we determine that:node (4) was creatednode (i) was creatednode (*) was created just attach t3 to *.
*
*
t1
i04
*t1,t3
i04
[]t2
a0
![Page 87: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/87.jpg)
Fall 2000 CS6241 / ECE8833A - (6-87)
Compiler Design
Front End Back EndSource Program
Intermediate Language
Scheduling Register Allocation
![Page 88: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/88.jpg)
Fall 2000 CS6241 / ECE8833A - (6-88)
Why Register Allocation?
• Storing and accessing variables from registers is much faster than accessing data from memory.
The way operations are performed in load/store
(RISC) processors.
• Therefore, in the interests of performance—if not by necessity—variables ought to be stored in registers.
• For performance reasons, it is useful to store variables as long as possible, once they are loaded into registers.
![Page 89: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/89.jpg)
Fall 2000 CS6241 / ECE8833A - (6-89)
The Goal
• Primarily to assign registers to variables.
• However, the allocator runs out of registers quite often.
• Decide which variables to “flush” out of registers to free them up, so that other variables can be bought in.
This important indirect consequence of allocation is referred to as spilling.
![Page 90: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/90.jpg)
Scheduling
• List• Cycle• Trace
![Page 91: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/91.jpg)
Trace Scheduling
• Compaction Code• Compensation Code
![Page 92: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/92.jpg)
Thank YouAll the best for exams….!!!
![Page 93: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/93.jpg)
![Page 94: Parallel Programming Models, Languages and Compilers](https://reader035.vdocuments.mx/reader035/viewer/2022062218/568163ca550346895dd505f0/html5/thumbnails/94.jpg)