Download - Sorting Things Out - OpenMP
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas1
SortingThingsOut
RuudvanderPasDistinguished EngineerSPARCMicroelectronics
SantaClara,CA,USASC’16Talkat OpenMPBoothTuesday,November15,2016
(withtasks)
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas2
TheDarkAgesOfOpenMPBigBrotherHadToKnowEverything
Andinadvance,(right)beforeexecutionForexample,thelooplength,numberof
parallelsections,etcGetshardwithmoredynamicproblemslikeprocessinglinkedlists,divideandconquer,
recursionAsolutionwasugly.Atbest
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas3
TaskingComesToTheRescue!
Andwewillshowyouhowitallworks
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas5
Ataskisachunkofindependentwork
Youguaranteedifferenttaskscanbeexecutedsimultaneously#pragmaomp task{“thisismytask”}
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas6
Theruntimesystemdecidesontheschedulingofthetasks
Atcertainpoints(implicitandexplicit),tasksareguaranteedtobecompleted
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas7
Forthosewholovetostudythefineprint,thefollowingadvice:
RTFM!Andthisiswhatitlookslike:
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas9
Thread
Generatetasks
Thread
Thread
Thread
Thread
Executetasks
TheTaskingConceptInOpenMP
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas10
WhoDoesWhatAndWhen?You
Useapragmatospecifywherethetasksare(Theassumptionisthatalltaskscanbeexecutedindependently)
• Whenathreadencountersataskconstruct,anewtaskisgenerated
• Themomentofexecutionofthetaskisuptotheruntimesystem
• Executioncaneitherbeimmediateordelayed• Completionofataskcanbeenforcedthroughtasksynchronization
TheOpenMPruntimesystem
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas12
ASimplePlan
Writeaprogramthatprintseither“Aracecar”or“Acarrace”andmaximizetheparallelism
YourTaskforToday:
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas13
TaskingExample/1
#include <stdlib.h>#include <stdio.h>
int main(int argc, char *argv[]) {
printf("A ");printf("race ");printf("car ");
printf("\n");return(0);
}
$ cc -fast hello.c$ ./a.outA race car$
What will this program print ?
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas14
#include <stdlib.h>#include <stdio.h>
int main(int argc, char *argv[]) {
#pragma omp parallel{
printf("A ");printf("race ");printf("car ");
} // End of parallel region
printf("\n");return(0);
} What will this program print using 2 threads ?
TaskingExample/2
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas15
$ cc -xopenmp -fast hello.c$ export OMP_NUM_THREADS=2$ ./a.outA race car A race car
Notethatthisprogramcould(forexample)alsoprint“AAraceracecarcar”or“AraceAcarracecar”,or“AraceAracecarcar”,or
.....ButIhavenotobservedthis(yet)
TaskingExample/3
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas16
#include <stdlib.h>#include <stdio.h>
int main(int argc, char *argv[]) {
#pragma omp parallel{
#pragma omp single{
printf("A ");printf("race ");printf("car ");
}} // End of parallel region
printf("\n");return(0);
}
What will this program print using 2 threads ?
TaskingExample/4
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas17
$ cc -xopenmp –fast hello.c$ export OMP_NUM_THREADS=2$ ./a.outA race car
But of course now only 1 thread executes .......
TaskingExample/5
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas18
int main(int argc, char *argv[]) {
#pragma omp parallel{
#pragma omp single{
printf(“A “);#pragma omp task{printf("race ");}
#pragma omp task{printf("car ");}
}} // End of parallel region
printf("\n");return(0);
}
What will this program print using 2 threads ?
TaskingExample/6
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas19
$ cc -xopenmp -fast hello.c$ export OMP_NUM_THREADS=2$ ./a.outA race car$ ./a.outA race car$ ./a.outA car race$
Tasks can be executed in arbitrary order
TaskingExample/7
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas20
AnotherSimplePlan
Havethesentenceendwith“isfuntowatch”(hint:useaprintstatement)
Youdidwellandquickly,sohereisafinaltasktodo
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas21
int main(int argc, char *argv[]) {
#pragma omp parallel{
#pragma omp single{
printf(“A “);#pragma omp task{printf("race ");}
#pragma omp task{printf("car ");}
printf(“is fun to watch “);}
} // End of parallel region
printf("\n");return(0);
}
What will this program print using 2 threads ?
TaskingExample/8
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas22
$ cc -xopenmp -fast hello.c$ export OMP_NUM_THREADS=2$ ./a.out
A is fun to watch race car$ ./a.out
A is fun to watch race car$ ./a.out
A is fun to watch car race$
Tasks are executed at a task execution point
TaskingExample/9
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas23
int main(int argc, char *argv[]) {
#pragma omp parallel{
#pragma omp single{
printf(“A “);#pragma omp task
{printf("car ");}#pragma omp task
{printf("race ");}#pragma omp taskwaitprintf(“is fun to watch “);
}} // End of parallel region
printf("\n");return(0);}
What will this program print using 2 threads ?
TaskingExample/10
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas24
$ cc -xopenmp -fast hello.c$ export OMP_NUM_THREADS=2$ ./a.out$ A car race is fun to watch $ ./a.outA car race is fun to watch$ ./a.outA race car is fun to watch$
Tasks are executed first now
TaskingExample/11
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas26
TheQuicksortAlgorithmACommonlyUsedAlgorithmUsedForSorting
UsesadivideandconquerstrategyMainsteps:
Splitthearraythroughapivot,suchthat
Allelementstotheleftaresmaller
Allelementstotherightareequal,orgreater
Repeatforleftandrightpartuntildone
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas27
ASimpleExample/1
8 5 7 3 9 initialvalues
8 5 7 3 9 choosepivot, keepindex
8 5 7 3 9 swappivotandlastelement
8 5 9 3 7 scanarray,swapifsmaller
8 5 9 3 7 5<7 =>movetoposition0
5 8 9 3 7 3<7 =>movetoposition1
5 3 9 8 7 continue,but nothing found
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas28
ASimpleExample/2
5 3 9 8 7 restorepivot
5 3 7 8 9 pivotisinfinalposition
5 3
7
8 9
repeatforleftbranch
repeatforrightbranch
OpenMPtask OpenMPtask
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas29
TheRecursiveSequentialCode1 void Quicksort(int64_t *a, int64_t lo, int64_t hi)2 {3 if ( lo < hi ) {4 int64_t p = partitionArray(a, lo, hi);56 (void) Quicksort(a, lo, p - 1); // Left branch78 (void) Quicksort(a, p + 1, hi); // Right branch9 }
10 }
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas30
AndNowWithTasks1 void Quicksort(int64_t *a, int64_t lo, int64_t hi)2 {3 if ( lo < hi ) {4 int64_t p = partitionArray(a, lo, hi);56 #pragma omp task shared(a) firstprivate(lo,p)7 {(void) Quicksort(a, lo, p - 1);} // Left branch89 #pragma omp task shared(a) firstprivate(hi,p)
10 {(void) Quicksort(a, p + 1, hi);} // Right branch1112 #pragma omp taskwait13 }12 }
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas31
IncludingTheDriverPart1 #pragma omp parallel default(none) shared(a,nelements)2 {3 #pragma omp single nowait4 { (void) Quicksort(a, 0, nelements-1); }5 } // End of parallel region
1 void Quicksort(int64_t *a, int64_t lo, int64_t hi)2 {3 if ( lo < hi ) {4 int64_t p = partitionArray(a, lo, hi);56 #pragma omp task default(none) firstprivate(a,lo,p)7 {(void) Quicksort(a, lo, p - 1);} // Left branch89 #pragma omp task default(none) firstprivate(a,hi,p)
10 {(void) Quicksort(a, p + 1, hi);} // Right branch11 }12 }
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas32
FineTuningTheAlgorithmWhenthearraysectiongetstoosmall,itisbettertoswitchtothesequentialalgorithmMayalsoconsidertheuseoftheif-clauseplus
themergeable andfinalclauses
Someexperimentationisrecommended;-)
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas33
APerformanceExample*
30.4
15.0
7.64.1
2.1 1.4 0.9 0.7051015202530354045
0
5
10
15
20
25
30
1 2 4 8 16 32 64 128
Speedup
oversinglethread
Elap
sed>m
e(secon
ds)
NumberofOpenMPthreads
PerformanceoftheOpenMPquicksortalgorithm(40Melements)
Elapsed>me(s) Speedup
*) SPARC M7-8 server @ 4.1 GHz
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas34
SummaryBigBrotherDoesNotNeedToKnowEverything
Forcertaintypesofalgorithms
Taskingisideallysuitable
Optimalperformancemayrequiresomefinetuning
But.......Remember:
OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas36
Thank You And ..... Stay Tuned [email protected]