python on hpcc

22
Dr. Yongjun Choi Python on HPCC ICER Workshop, Oct/12/2020

Upload: others

Post on 02-May-2022

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Python on HPCC

Dr. Yongjun Choi

Python on HPCCICER Workshop, Oct/12/2020

Page 2: Python on HPCC

Scope of this workshop• What we want to do:

• Explain What MSU HPC are doing to support Python users.

• Provide guidance to help users improve Python performance at the the HPC

• Point out tools that support developers of Python in HPC

• What we assume:

• You know and use Python, or

• You know and use HPC and are curious about using Python in your own HPC work.

Page 3: Python on HPCC

Getting Started with Python Resouces• https://www.python.org/about/gettingstarted/

• https://wiki.python.org/moin/BeginnersGuide/

• https://www.codecademy.com/learn/python/

• https://www.coursera.org/specializations/python/

• https://software-carpentry.org/lessons/

• https://pymotw.com/

• https://wiki.hpcc.msu.edu/display/ITH/Python

• https://www.youtube.com/watch?v=_uQrJ0TkZlc

• py4e.com

• ……

Page 4: Python on HPCC

Python is a very popular languge

Most popular coding Languages of 2020: www.tiobe.com/tiobe-index

https://insights.stackoverflow.com/survey/2020#technology-most-loved-dreaded-and-wanted-languages-loved

Page 5: Python on HPCC

Why is Python so popular?• Easy

• Clean, clear syntax

• Multi-paradigm, integreted

• No manual garbage collection

• Flexible, full-feature data strutures

• Extensive standard libraries

• Open-source packages

Page 6: Python on HPCC

The Scientific Python Stacks

• Primary Uses:

• Script workflows for both data analysis and simulations

• Perform exploratory, interactive data analysis and vizualization

Page 7: Python on HPCC

Python at MSU HPCC• HPCC supports Python

• Maximizing Python performance can be challenging:

• Interpreted languages are difficult to optimize.

• Designed only one native thread can execute at a time.

• Designed and implemented without considering realities of HPC.

Page 8: Python on HPCC

Basic Guidelines for Python in HPC• Identify and exploit parallelism at the core, node, and cluster levels

• Understand and apply Numpy array syntax and its broadcasting rules:

• https://numpy.org/doc/stable

• https://numpy.org/doc/stable/user/basics.broadcasting.html

• Measure your codes’ performance using profiling tools

• https://stackify.com/how-to-use-python-profilers-learn-the-basics/

Page 9: Python on HPCC

Python at HPCC• HPC Module

• module avail python

• Module spider python

• module load python

• module load Python/2.7.9

• Or install your own Python (many options, but we suggest Anaconda)

• System python (/usr/bin/python): risky, not recommended.

Page 10: Python on HPCC

Using Python In HPC• limited packages: only a few very famous packages are installed such as

Numpy, Matplotlib

• Why? Python has a lot of packages, modules and libraries that researchers may want to use. However, it is difficult for HPCC to keep up with and avoid conflicts between different versions of packages and libraries.

• https://wiki.hpcc.msu.edu/display/ITH/Python

• Virtual environment (virtualenv)

• Anaconda

Page 11: Python on HPCC

Virtualenv• Based on HPC Python: Users control packages. HPC controls Python

• https://wiki.hpcc.msu.edu/x/xIEVAg

Page 12: Python on HPCC

Anaconda (recommended)• Easy to install.

• Install on your home or research space

• Fully control by users

• https://www.anaconda.com

• Download https://www.anaconda.com/products/individual

• https://wiki.hpcc.msu.edu/display/ITH/Using+conda

• pip and Anaconda can be used for package installation. However, it would be better to stick to one way.

• pip/conda can not uninstall packages which were installed via conda/pip.

Page 13: Python on HPCC

Jupyter notebook• https://ondemand.hpcc.msu.edu/pun/sys/dashboard

Page 14: Python on HPCC

Can my Python code be faster?• Vectorization

• Do not using loop if possible. Instead, use Numpy.

• eg: ex01.py

• Parallelization (MPI, OpenMP, OpenACC, Thread)

• Workflows - eg: simultaneously launching with job-array (eg: ex02.py, and ex02.sb)

• Numba: has some restrictions, but it makes your code very fast!

• eg: https://murillogroupmsu.com/numba-versus-c/

• ex03.py

Page 15: Python on HPCC

Use Threaded Libraries• Packages like NumPy, SciPy are already built with MPI and thread support via

BLAS/LAPACK, MKL

• Don’t reimplement solvers in pure Python

• Many of your favorite threaded libraries and packages already have bindings:

• PyTrilinos

• Petsc4py

• Elemental

• SLEPc

• Do not try to reinvent wheels. If it is not new, probably it is already implemented in a very nice way.

Page 16: Python on HPCC

Using Compiled Modules• Methods of using pre-compiled, threaded GIL-free code for speed include:

• Cython

• F2py

• PyBind11

• Swig

• Boost

• Ctypes

• Writing bindings in C/C++ (https://docs.python.org/3/extending/extending.html/)

Page 17: Python on HPCC

Profiling: cProfile, SnakeViz, VTune (intel) etc• cProfile: https://docs.python.org/3/library/profile.html

• SankeViz: https://jiffyclub.github.io/snakeviz/

• VTune: https://software.intel.com/content/www/us/en/develop/tools/vtune-profiler.html

• module load Vtune

• Check speed (time), calls (frequency), memory

Page 18: Python on HPCC

Parallelization: numba - parallel• Automatic parallelization with numba

• Very easy to use - You need only one line decorator: @njit(paralle=True)

• More information:

• https://numba.pydata.org/numba-doc/latest/user/parallel.html

Page 19: Python on HPCC

Parallelization: numba - cuda

• Only works with NVIDIA GPU cards

• Easy to use (at least much easier to use than other languages).

• https://github.com/keipertk/pygpu-workshop

Page 20: Python on HPCC

Parallelization: MPI• MPI

• It is the HPC paramdigm for inter-process communications

• MPI makes full use of HPC envirionments

• Well-supported tools exist

• Python-MPI bindings have been developed since 1996

Page 21: Python on HPCC

Parallelization: MPI - mpi4py• mpi4py

• Pythonic wrapping of the system’s native MPI

• Provides almost all MPI-1, 2 and common MPI-3 features

• Very well maintained

• Distributed with major Python distributions

• Portalbe and scalable

• Requires only - NumPy, Cython (build only), and MPI library

• https://mpi4py.readthedocs.io/en/stable/#

Page 22: Python on HPCC

More Resources• Getting help:

• Office hrs: Mon/Thur 1-2PM

• https://icer.msu.edu/contact

• Documentation

• https://wiki.hpcc.msu.edu