exploring parallelism with joseph pantoga jon simington

23
Exploring Parallelism with Joseph Pantoga Jon Simington

Upload: leo-merritt

Post on 19-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

The Global Interpreter Lock  A lock enforced by the Python interpreter to avoid sharing memory with non- thread-safe threads  Limits the amount of parallelism through concurrency when using multiple threads  Very little, if any speedup on a multiprocessor machine

TRANSCRIPT

Page 1: Exploring Parallelism with Joseph Pantoga Jon Simington

Exploring Parallelism with Joseph PantogaJon Simington

Page 2: Exploring Parallelism with Joseph Pantoga Jon Simington

Issues Between Python and C- Python is inherently slower than C

- Especially using libraries that take advantage of Python’s relationship with C / C++ code

- Thanks interpreter & dynamic typing scheme

- Python 3 can be comparable to C in some respects, but still slower on the average case (we use Python 2.7.10)

- Python too popular?- So many devs with so many ideas leads to many incomplete projects, but

plenty of room for contribution

- Python’s Global Interpreter Lock (GIL)- Prevents more than 1 thread from running at a time

Page 3: Exploring Parallelism with Joseph Pantoga Jon Simington

The Global Interpreter Lock

- A lock enforced by the Python interpreter to avoid sharing memory with non-thread-safe threads

- Limits the amount of parallelism through concurrency when using multiple threads

- Very little, if any speedup on a multiprocessor machine

Page 4: Exploring Parallelism with Joseph Pantoga Jon Simington

The Global Interpreter Lockdef countdown(n): while n > 0: n -= 1

count = 100000000countdown(count)

t1 = Thread(target=countdown, args=(count//2,))t2 = Thread(target=countdown, args=(count//2,))

t1.start(); t2.start()t1.join(); t2.join()

t1 = Thread(target=countdown, args=(count//4,))t2 = Thread(target=countdown, args=(count//4,))t3 = Thread(target=countdown, args=(count//4,))t4 = Thread(target=countdown, args=(count//4,))

t1.start(); t2.start(); t3.start(); t4.start()t1.join(); t2.join(); t3.join(); t4.join()

Sequential

2 Threads

4 Threads

7.8s

15.4s

15.7s

- The GIL ruins everything!

- Thread-based Parallelism is often not worth it with Python

*test completed on 3.1GHz x4 machine with Python 2.7.10

Page 5: Exploring Parallelism with Joseph Pantoga Jon Simington

Getting around the GIL

- Make calls to outside libraries and circumvent the interpreter’s rules entirely

- Python modules that call external C libraries have inherent latency

- BUT! In certain cases, Python + C MPI performance can be comparable to the native C libraries

Page 6: Exploring Parallelism with Joseph Pantoga Jon Simington

How does Python + C compare to C?- The following was tested on the Beowulf class cluster `Geronimo` at CIMEC with ten

Intel P4 2.4GHz processors, each equipped with 1GB DDR 333MHz RAM connected together on a 100Mbps ethernet switch. The mpi4py library was compiled with MPICH 1.2.6from mpi4py import mpi

import numarray as na

sbuff = na.array(shape=2**20,type=na.Float64)

wt = mpi.Wtime()

if mpi.even: mpi.WORLD.Send(buffer, mpi.rank + 1) rbuff = mpi.WORLD.Recv(mpi.rank + 1)else: rbuff = mpi.WORLD.Recv(mpi.rank - 1) mpi.WORLD.Send(buffer, mpi.rank - 1)

wt = mpi.Wtime() - wt

tp = mpi.WORLD.Gather(wt, root=0)

if mpi.zero: print tp

http://www.cimec.org.ar/ojs/index.php/cmm/article/viewFile/8/11

Page 7: Exploring Parallelism with Joseph Pantoga Jon Simington

How does Python + C compare to C?The rest of the graphs display time analysis from similar programs, with only the MPI

instruction differing.

http://www.cimec.org.ar/ojs/index.php/cmm/article/viewFile/8/11http://www.cimec.org.ar/ojs/index.php/cmm/article/viewFile/8/11

Page 8: Exploring Parallelism with Joseph Pantoga Jon Simington

How does Python + C compare to C?

http://www.cimec.org.ar/ojs/index.php/cmm/article/viewFile/8/11http://www.cimec.org.ar/ojs/index.php/cmm/article/viewFile/8/11

Page 9: Exploring Parallelism with Joseph Pantoga Jon Simington

How does Python + C compare to C?

- For large data sets, Python performs very similarly to C

- Python has less bandwidth available as mpi4py uses an MPI library from C to perform general networking calls

- But, in general, Python is slower than C

http://www.cimec.org.ar/ojs/index.php/cmm/article/viewFile/8/11

Page 10: Exploring Parallelism with Joseph Pantoga Jon Simington

Python’s Parallel Programming Libraries- Message Passing Interface (MPI)

- pyMPI

- mpi4py - uses the C MPI library directly

- Pypar

- Scientific Python (MPIlib)

- MYMPI

- Bulk Synchronous Parallel (BSP)- Scientific Python (BSPlib)

Page 11: Exploring Parallelism with Joseph Pantoga Jon Simington

pyMPI- Almost-full MPI instruction set

- Requires a modified Python interpreter which allows for ‘interactive’ parallelism

- Not maintained since 2013

- The modified interpreter is the parallel application -> Have to recompile the interpreter whenever you want to do different tasks

Page 12: Exploring Parallelism with Joseph Pantoga Jon Simington

Pydusa formerly MYMPI

- 33KB Python module -- no custom Python interpreter to maintain

- While the MPI Standard contains 120+ routines, MYMPI contains 35 “important” MPI routines

- Syntax is very similar to the Fortran, C MPI libraries

- Your Python code is the parallel application

Page 13: Exploring Parallelism with Joseph Pantoga Jon Simington

pypar

- No modified interpreter needed!

- Still maintained on GitHub

- Few MPI interfaces are implemented

- Can’t handle topologies well and prefers simple data structures in parallel calculations

Page 14: Exploring Parallelism with Joseph Pantoga Jon Simington

mpi4py

- Still being maintained on Bitbucket (updated 11/23/2015)

- Makes calls to external C MPI functions to avoid GIL

- Attempts to borrow ideas from other popular modules and integrate them together

Page 15: Exploring Parallelism with Joseph Pantoga Jon Simington

Scientific Python

- GREAT documentation -> Easy to use with their examples

- Supports both MPI and BSP

- Requires installation of both an MPI and a BSP library

Page 16: Exploring Parallelism with Joseph Pantoga Jon Simington

Is Parallelism Fully Implemented?

- From our research so far, we have not found a publically-available Python package that fully implements the full MPI instruction set

- Not all popular languages have complete and extensive libraries for every task or use case!

Page 17: Exploring Parallelism with Joseph Pantoga Jon Simington

Conclusion- You CAN create parallel programs and applications with Python

- Doing so efficiently can require the compilation of a large custom Python Interpreter

- Should they try to keep it in future versions or even maintain the current implementations?

- From our research it seems like the community has done just about all they could do to bring parallelism to Python but some sacrifices have to be made, mainly a restriction on what data types can and can’t be supported

Page 18: Exploring Parallelism with Joseph Pantoga Jon Simington

Conclusion Cont.- Maybe Python isn’t the best language to implement parallel

algorithms in, but there are many other languages besides C and Fortran which have interesting approaches to solving parallel problems

Page 19: Exploring Parallelism with Joseph Pantoga Jon Simington

Julia- Really good documentation for parallel tasks with examples

- Able to send a task to n connected computers and asynchronously receive the results back, both upon request, and automatically when the task completes

- Has pre-defined topology configurations for networks like all-to-all and master-slave

- Allows for custom worker configurations to fit your specific topology

Page 20: Exploring Parallelism with Joseph Pantoga Jon Simington

Go- Fairly good documentation, along with an interactive interpreter on site to learn the basics without installing anything.

- Initial installation comes with all required libraries for parallel coding. So no extra libraries to search for or install.

- Lightweight and easy to learn

- Can write several parallel programs using simple functions in Go

Page 21: Exploring Parallelism with Joseph Pantoga Jon Simington

Questions?

Page 22: Exploring Parallelism with Joseph Pantoga Jon Simington

Sourceshttp://www.researchgate.net/profile/Mario_Storti/publication/220380647_MPI_for_Python/links/00b495242ba3b30eb3000000.pdf

http://www.researchgate.net/profile/Leesa_Brieger/publication/221134069_MYMPI_-_MPI_programming_in_Python/links/0c960521cd051bc649000000.pdf

http://uni.getrik.com/wp-content/uploads/2010/04/pyMPI.pdf

http://www.researchgate.net/profile/Konrad_Hinsen/publication/220439974_High-Level_Parallel_Software_Development_with_Python_and_BSP/links/09e4150c048e4e7cd8000000.pdf

http://www.researchgate.net/profile/Ola_Skavhaug/publication/222545480_Using_B_SP_and_Python_to_simplify_parallel_programming/links/0fcfd507e6cac3eb63000000.pdf

http://downloads.hindawi.com/journals/sp/2005/619804.pdf

Page 23: Exploring Parallelism with Joseph Pantoga Jon Simington

Sourceshttp://geco.mines.edu/workshop/aug2010/slides/fri/mympi.pdf

http://sourceforge.net/projects/pydusa/

http://docs.julialang.org/en/latest/manual/parallel-computing/

http://dirac.cnrs-orleans.fr/plone/software/scientificpython

http://dirac.cnrs-orleans.fr/ScientificPython/ScientificPythonManual/