tile reduction: the first step towards tile aware parallelization in openmp ge gan department of...
Post on 21-Dec-2015
213 views
TRANSCRIPT
Tile Reduction: the first step towards tile aware
parallelization in OpenMP
Ge GanDepartment of Electrical and Computer
EngineeringUniv. of Delaware
Overview• Background• Motivation• A new idea: Tile Reduction• Experimental Results• Conclusion• Related Work• Future Work
2
Tile/Tiling• Natural representation of data objects that
are heavily used in scientific algorithms• Tiling improves data locality• Tiling can increase parallelism and reduce
synchronization in parallel programs• It is an effective compiler optimizing
technique• Essentially a program design paradigm• Supported in many parallel programming
languages: ZPL, CAF, HTA, etc.
3
OpenMP• OpenMP is the de facto standard for
shared-memory parallel programming• Provides a simple and flexible interface for
developing portable and scalable parallel application
• Support incremental parallelization• Maintain sequential consistency• “tile oblivious”, no directive or clause can
be used to annotate data tile and carry such information to compiler
4
Parallelizing: the traditional way(2)
• Can only leverage the traditional scalar reduction in OpenMP
• Parallelism is trivial• Data locality is not bad• Not natural and intuitive
7
The Expected Parallelization
8
• View the inner most two loops as a macro operation performing on the 2x2 data tiles
• Aggregate the data tiles in parallel• More parallelism• Better data locality
Terms• Reduction Tile: the data tile under
reduction• Tile descriptor: the “multi-dimensional
array” in the list construct• Reduction kernel loops: the loops involved
in performing “one” recursive calculation• Tile name• Dimension descriptor: the tuples following
the tile name
10
A Use Case
11
Tiled Matrix Multiplication
Tile Reduction Applied on the Tiled Matrix Multiplication Code
Code Generation (1)
12
• Distribute the iterations of the parallelized loop among the threads
• Allocate memory for the private copy of the tile used in the local recursive calculation
• Perform the local recursive calculation which is specified by the reduction kernel loops
• Update the global copy of the reduction tile
Conclusions
17
• As one of the building block of the tile aware parallelization theory, tile reduction brings more opportunities to parallelize dense matrix applications
• For some benchmarks, tile reduction is a more natural and intuitive way to reason about the best parallelization decision
• For some benchmarks, tile reduction not only can improve data locality, but also can expose more parallelism
• Amiable to programmers• Code generation is as simple as the scalar
reduction in the current OpenMP• Runtime overhead is trivial
Similar Works
18
• Parallel reduction is supported in:• C**: Viswanathan, G., Larus, J.R.: User-defined reductions for efficient
communication in data-parallel languages. Technical Report 1293, University of Wisconsin-Madison (Jan 1996)
• SAC: Scholz, S.B.: On defining application-specific high-level array operations by means of shape invariant programming facilities. In: APL ’98: Proceedings of the APL98 conference on Array processing language, New York, NY, USA, ACM (1998) 32–38
• ZPL: Deitz, S.J., Chamberlain, B.L., Snyder, L.: High-level language support for user-defined reductions. J. Supercomput. 23(1) (2002) 23–37
• UPC Consortium: UPC Collective Operations Specifications V1.0 A publication of the UPC Consortium (2003)
• Forum, M.P.I.: MPI: A message-passing interface standard (version 1.0). Technical report (May 1994) URL http://www.mcs.anl.gov/mpi/mpi-report.ps.
• Kambadur, P., Gregor, D., Lumsdaine, A.: Openmp extensions for generic libraries. In: Lecture Notes in Computer Science: OpenMP in a New Era of Parallelism, IWOMP’08, International Workshop on OpenMP. Volume 5004/2008., Springer Berlin / Heidelberg (2008) 123–133