openmp
TRANSCRIPT
Introduction
• What is parallel processing ?It is ability of processing more than one job simultaneously.
• Why going parallel ?• Great deal of data to be processed • Time needed to calculate an engineering equation• Need jobs to be done faster
Isfahan University of Technology, Dep. Electronic and Computer Engineering 2
Technologies
• What technologies used for parallel processing ?• Network based parallel processing• Utilizing CPU free time and power• Fact is most of CPU time and power is wasting• Tearing down jobs and run them on resources
• Local parallelism on multicore/multiprocessor systems• Utilize the concept of multithreading• Utilize the concept of share memory• Can be run on either GPU or CPU
Isfahan University of Technology, Dep. Electronic and Computer Engineering 3
Tools and Technics
• What tools used for parallel processing ?• Network based parallel processing• Gird based parallel computing • Cloud based parallelism and Cloud computing
• Local parallelism on multicore/multiprocessor systems• NVidia® CUDA™• MPI• Posix Threads • OpenMP
Isfahan University of Technology, Dep. Electronic and Computer Engineering 4
What is OpenMP
• OpenMP• In simple word runs a user program in parallel.• It utilize to main concepts for parallelism • Multithreading• Shared Memory
• It takes user application, tear it down into group of threads and runs them on a shared memory foundation
Isfahan University of Technology, Dep. Electronic and Computer Engineering 5
Why using OpenMP
• It is simple to use it• Most of the times there is no need to change program
code• It utilize compiler directives to demonstrate parallel region• It is cross platform• It supports by Fortran and C / C++
Isfahan University of Technology, Dep. Electronic and Computer Engineering 6
Programming Model
• Shared Memory• Parallelism by threading• Fork-Join model• Explicit Parallelism• Nested Parallelism• Dynamic Threads• Input / Output• Memory model
Isfahan University of Technology, Dep. Electronic and Computer Engineering 7
Shared Memory
• What is shared memory ?• Why using shared memory?• Shared Memory in OpenMP
Isfahan University of Technology, Dep. Electronic and Computer Engineering 8
Shared Memory (Cont.)
• Following system can be used for shared memory access• a single core chip (older PC’s, sequential execution)• a multicore chip (such as your laptop?)• multiple single core chips in a NUMA system• multiple multicore chips in a NUMA system (VT SGI system)
Isfahan University of Technology, Dep. Electronic and Computer Engineering 9
UMA Vs. NUMA
• Unified Memory Access ( UMA )
Isfahan University of Technology, Dep. Electronic and Computer Engineering 10
UMA Vs. NUMA (Cont.)
• Non Unified Memory Access ( NUMA )
Isfahan University of Technology, Dep. Electronic and Computer Engineering 11
Multi Threading
• What is Multi Threading• What is Intel Hyper-Threading• Why using Multi Threading• Multi Threading in OpenMP
Isfahan University of Technology, Dep. Electronic and Computer Engineering 12
Fork – Join Model
• What is Fork• What is Join• How Multi Threading works in OpenMP
Isfahan University of Technology, Dep. Electronic and Computer Engineering 13
Fork – Join Model (Cont.)
Isfahan University of Technology, Dep. Electronic and Computer Engineering 14
F J
Master Thread
Thread
OpenMP Elements
• Compiler Directives• Runtime Libraries• Environmental Variables
Isfahan University of Technology, Dep. Electronic and Computer Engineering 15
How to use OpenMP
• OpenMP implemented for C/C++ and Fortran• In C/C++ we use compiler directives • We only need to specify the parallel region
Isfahan University of Technology, Dep. Electronic and Computer Engineering 16
How to use OpenMP
• In non Microsoft compiler:
Isfahan University of Technology, Dep. Electronic and Computer Engineering 17
How to use OpenMP (Cont.)
• In Visual Studio :
Isfahan University of Technology, Dep. Electronic and Computer Engineering 18
Real Experimentvoid main()
{
omp_set_num_threads(6);
LARGE_INTEGER frequency; // ticks per secon
LARGE_INTEGER t1, t2; // ticks
double elapsedTime;
// get ticks per second
QueryPerformanceFrequency(&frequency);
// start timer
QueryPerformanceCounter(&t1);
#pragma omp parallel for
for(int i =0 ; i < 999999 ; i++)
for(int i =0 ; i < 1000 ; i++);
// stop timer
QueryPerformanceCounter(&t2);
elapsedTime= (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart;
cout << elapsedTime << " ms.\n";
}Isfahan University of Technology, Dep. Electronic and Computer
Engineering 19
Experiment Result - Sequential
• It took 3347.68 milliseconds to run
Isfahan University of Technology, Dep. Electronic and Computer Engineering 20
Experiment Result - Parallel
• It took 983.576 milliseconds to run
Isfahan University of Technology, Dep. Electronic and Computer Engineering 21