mpi & openmp mixed - uhemwiki.uhem.itu.edu.tr/w/images/c/ca/openmp-mixed_apps.pdfmpi vs. openmp case...

Click here to load reader

Post on 13-May-2021

1 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

22 Haziran 2012
–  MPI  &  OpenMP  Advantages/Disadvantages   –  MPI  vs.  OpenMP  
–  Why  Mixed/Hybrid  Programming?  
Shared & Distributed Programming
22.06.2012 3/36  
•  Shared  Memory  (thread  based)   –  Mul/ple  threads  sharing  data  in  the  same  address  
space  and  explicity  synchronizing  when  needed  
•  Distributed  Memory  (process  based)   –  Dis/nct  processes,  explicitly  partaking  in  the  
pairwise  and  collec/ve  exchange  of  control  and  data   messages  
–  No  way  to  directly  access  the  variables  in  the   memory  of  another  process  
Hybrid Programming: OpenMP + MPI & Apps.
MPI Case
22.06.2012 4/36  
•  Disadvantages   –  Difficult  to  develop  and  debug   –  High  latency,  low  bandwidth   –  Explicit  communica/on  
–  Large  granularity   –  Difficult  load  balancing  (dynamic/sta/c)  
Hybrid Programming: OpenMP + MPI & Apps.
OpenMP Case
22.06.2012 5/36  
•  Advantages   –  Easy  to  implement  parallelism   –  Low  latency,  high  bandwidth     –  Implicit  Communica/on     –  Coarse  and  fine  granularity       –  Dynamic  load  balancing  
•  Disadvantages   –  Only  on  shared  memory  machines   –  Scale  within  one  node   –  Possible  data  placement  problem   –  No  specific  thread  order  
Hybrid Programming: OpenMP + MPI & Apps.
MPI vs. OpenMP Case
22.06.2012 6/36  
•  MPI  and  OpenMP  are  both  suitable  for  coarse  grain   parallelism  (mul/ple  asynchronous  processor)    
•  OpenMP  can  be  effec/vely  used  for  fine  grain   parallelism  (vectoriza/on)    
•  Both  MPI  and  OpenMP  can  be  used  to  parallelize   applica/ons  in  data  parallel  and  task  parallel  fashion    
•  Even  though  OpenMP  is  based  upon  sharing  out  work,  it   is  possible  to  assign  data  to  individual  threads    
•  With  some  care  when  assigning  work,  a  data  parallel   mode  can  be  closely  approximated  in  OpenMP  
Hybrid Programming: OpenMP + MPI & Apps.
Why Hybrid Programming?
22.06.2012 7/36  
•  Hybrid  model  is  an  excellent  match  for  the   dominant  trend  in  parallel  architectures  which   are  made  of  clusters  of  mul/-‐core  shared   memory  or  SMP  (Symmetric  Mul/-‐Processor)   nodes.  Such  as  quad  or  dual  core  Xeon   Processor  
•  Avoid  extra  communica/on  overhead  with  MPI   within  node  
•  Could  have  be_er  scalability  both  pure  MPI   and  OpenMP  
Hybrid Programming: OpenMP + MPI & Apps.
Why Hybrid Programming?
22.06.2012 8/36  
•  ANADOLU,  KARADENIZ,  EGE,  …   –  Intel  Xeon  EM64T  arch.   –  dual  &  quad  core  nodes  (as  smp)   –  Front  Side  Bus  !  
Hybrid Programming: OpenMP + MPI & Apps.
Hybrid Programming
22.06.2012 9/36  
•  Mul/ple  OpenMP  threads  under   each  MPI  process     –  OpenMP  threads  can  be  used  within  
each  shared  memory  node  and  MPI  can   be  used  to  communicate  across  nodes  
–  Eliminates  message  passing  within  a   single  shared  memory  node    
–  Nested  parallelism  is  possible  in  a  hybrid   model  
–  Right  approach  for  DSM  architectures   comprising  of  a  large  number  of  shared   memory  SMP  nodes  
Hybrid Programming: OpenMP + MPI & Apps.
Hybrid Programming Models
pu ni ty .o rg /e ve nt s/ ew
om p0
3/ om
sd ay /S es si on
7/ T0 1p
MPI Routines for Threads
22.06.2012 11/36  
•  MPI_INIT_THREAD   –  It  allows  to  request  a  level  of  thread  support   –  REQUIRED  argument  can  be  one  of  the  MPI_THREAD_[SINGLE  
|  FUNNELED  |  SERIALIZED  |  MULTIPLE]   –  Returned  PROVIDED  argument  may  be  less  than  REQUIRED  by  
the  applica/on  
int MPI_Init_thread(int *argc, char **argv, int required, int *provided);
C  
Fortran  
MPI Routines for Threads
MPI_THREAD_SINGLE  in  MPP  systems  and   MPI_THREAD_FUNNELED  in  hybrid  systems  (NEC  SX-‐5,  SX-‐6,   SX-‐8)  
Hybrid Programming: OpenMP + MPI & Apps.
MPI Routines for Threads
•  MPI_INIT_THREAD   –  MPI_THREAD_SINGLE:  There  is  only  one  thread  in  applica/on.  
There  is  no  OpenMP  mul/threading  in  program.   –  MPI_THREAD_FUNNELED:  There  is  only  one  thread  that  
makes  MPI  calls.  All  of  the  MPI  calls  are  made  by  the  master   thread.  This  will  happen  if  all  MPI  calls  are  outside  OpenMP   parallel  regions  or  are  in  master  regions.    
Hybrid Programming: OpenMP + MPI & Apps.
MPI Routines for Threads
•  MPI_INIT_THREAD   –  MPI_THREAD_SERIALIZED:  Mul/ple  threads  make  MPI  calls,  
but  only  one  at  a  /me.  This  can  be  enforced  in  OpenMP  by   OpenMP  SINGLE  direc/ve.    
–  MPI_THREAD_MULTIPLE:  Any  thread  may  make  MPI  calls  at   any  /me.  
–  All  MPI  implementa/ons  of  course  support  MPI  THREAD   SINGLE.  
See  for  detail  informa/on:  h_p://www.mcs.anl.gov/~lusk/hybridpaper.pdf  
Hybrid Programming: OpenMP + MPI & Apps.
MPI Routines for Threads
synchroniza/on  with  OMP_MASTER   –  It  implies  all  other  threads  are  sleeping!    
!$OMP BARRIER !$OMP MASTER call MPI_xxx(…) !$OMP END MASTER !$OMP BARRIER
(Strategy:  In  a  threaded  barrier,  master  process  synchronizes  processes)  
Hybrid Programming: OpenMP + MPI & Apps.
MPI Routines for Threads
MPI Routines for Threads
•  MPI  Calls  inside  OMP_SINGLE  |  MASTER   –  MPI_THREAD_SERIALIZED  required   –  OMP_BARRIER  is  needed  since  OMP_SERIAL  only  guaran/es  
synchroniza/on  at  the  end   –  It  also  implies  all  other  threads  are  sleeping!    
!$OMP BARRIER !$OMP SINGLE call MPI_xxx(…) !$OMP END SINGLE
Hybrid Programming: OpenMP + MPI & Apps.
MPI Routines for Threads
single  thread  is  making  MPI  calls,  other  threads  are  compu/ng  
!$OMP PARALLEL if (my_thread_rank < 1) then call MPI_xxx(…) else do some computation endif !$OMP END PARALLEL
Hybrid Programming: OpenMP + MPI & Apps.
MPI Routines for Threads
single  thread  is  making  MPI  calls,  other  threads  are  compu/ng  
!$OMP PARALLEL if (my_thread_rank < 1) then call MPI_xxx(…) else do some computation endif !$OMP END PARALLEL
Hybrid Programming: OpenMP + MPI & Apps.
Advantages & Disadvantages
22.06.2012 20/36  
•  THREAD_SINGLE   –  Easy  to  program   –  other  threads  are  sleeping  while  master  thread  calls  MPI  
rou/nes  
•  THREAD_FUNNELED   –  load  balance  is  necessary   –  useful  for  dynamic  task  distribu/on  and  problema/c  for  
domain  decomposi/on  programming  
Advantages & Disadvantages
22.06.2012 21/36  
•  depends  on  your  applica/on  needs  for  communica/on   •  capability  for  SMP  paralleliza/on   •  your  available  working  hours  for  hybrid  programming  
•  Is  the  MPI  Library  thread  safe?  
•  From  which  code  line  I  can  call  MPI  rou:nes?  
•  In  which  code  block  I  can  use  parallel  regions?  
Hybrid Programming: OpenMP + MPI & Apps.
Thread Info for MPI Process
22.06.2012 22/36  
•  MPI_QUERY_THREAD   –  Returns  provided  level  of  thread  support   –  Thread  level  may  be  set  via  environment  variable!  
•  MPI_IS_THREAD_MAIN   –  Returns  true  if  this  is  the  thread  that  invoked  MPI_INIT  or  
MPI_INIT_THREAD   –  Indicates  that  which  thread  is  master  
MPI_IS_THREAD_MAIN(LOGICAL FLAG, INTEGER IERROR)
Limitations & Problems
22.06.2012 23/36  
•  OpenMP  has  less  scalability  due  to  implicit   parallelism  while  MPI  allows  mul/-‐   dimensional  blocking.  
•  All  threads  are  idle  except  one  while  MPI   communica/on     –  Need  to  overlap  computa/on  and  
communica/on  for  be_er  performance.    
Hybrid Programming: OpenMP + MPI & Apps.
Limitations & Problems
22.06.2012 24/36  
•  Cache  coherence,  data  placement  
•  Natural  one  level  parallelism  problems.   •  Pure  OpenMP  code  performs  worse  than  pure  
MPI  within  node  
•  Not  necessarily  faster  than  pure  MPI  or  pure   OpenMP  
Hybrid Programming: OpenMP + MPI & Apps.
Limitations & Problems
22.06.2012 25/36  
•  Partly  depends  on  code,  architecture,  partly  on   how  the  programming  models  interact    
•  How  many  :mes  is  a  new  team  of  threads  created?  
•  Does   machine/OS   put   new   threads   where   they   best   reuse  data  in  cache?  
•  How  much  of  the  MPI  code  is  mul:threaded?  
•  How  well  does  the  code  use  the  comm.  hardware?  
Hybrid Programming: OpenMP + MPI & Apps.
Lab: Example 1
•  PI  Calcula/on   –  To  demonstrate  the  bo_om  to  top  approach  in  
MPI/OpenMP  program  development,  we   provide  a  simple  program  for  the  calcula/on  of   the  π  number  using  an  easily  parallelizable   trapezoidal  rule  integral  evalua/on  
–  MPI  (mpi_pi.c)  /  OpenMP  (omp_pi.c)  and   Hybrid  (hybrid_pi.c)  versions  
dx x2 +1
Lab: Example 1
Lab: Example 1
Run  code  
Lab: Example 1
Compile  code  
Run  code  
Warning:  Intel  MPI  3.0  has  some  bugs  and  gives  error   messages  when  submiwng  ‘mixed  programed’  codes.   Error:  system  error(22):  __kmp_run/me_destroy:   pthread_key_delete:  Invalid  argument   OMP  abort:  fatal  system  error  detected.  
3.1  
Lab: Example 2
22.06.2012 30/36  
•  2D  Heat  Equa/on  Calcula/on   –  Write  Hybrid  MPI  +  OpenMP  program  using  the    
given  Heat2D  MPI  program  
–  Write  MPI  Thread  Strategy  with   MPI_THREAD_FUNNELED  
–  Write  Hybrid  code  with  at  least  one  of  the   MPI_INIT_THREAD  
  MULTIPLE     SERIALIZED