the rephrase eu -project · the rephrase eu -project ... games ... 3 . thinking in parallel...
TRANSCRIPT
The RePhrase EU-Project
Ongoing research at UC3M:
Composable parallel patterns for stream parallelism
Manuel F. Dolz, David del Rio, Javier Garcia-Blas, J. Daniel Garcia
University Carlos III of Madrid
NESUS Cost IC1305 – Fifth Working group Meeting
Ljubjana, July 8th, 2016
ARCOS
RePhrase Project: Refactoring Parallel Heterogeneous Software
– a Software Engineering Approach
(ICT-644235), 2015-2018, €3.6M budget
8 Partners, 6 European countries UK, Spain, Italy, Austria, Hungary, Israel
0 http://www.rephrase-ict.eu
ARCOS
All future programming will be parallel
1. No future system will be single-core parallel programming will be essential
2. It’s ot just a out perfor a e it’s also a out e ergy usage
3. If e do ’t sol e the ulti ore halle ge, the o other ad a es will matter! user interfaces
cyber-physical systems
robotics
games
...
3
Thinking in Parallel
Fundamentally, programmers must learn to think parallel
this requires new high-level programming constructs
you cannot program effectively while worrying about deadlocks etc.
they must be eliminated from the design!
you cannot program effectively while handling with communication
etc.
this needs to be packaged/abstracted!
you cannot program effectively without performance information
this needs to be included!
We use two key technologies:
Refactoring (changing the source code structure)
Parallel Patterns (high-level functions of parallel algorithms)
4
ARCOS
Some Common Patterns
1. High-level abstract patterns of common parallel algorithms
5
ARCOS
A Pattern-Based Approach
1. Start bottom-up identify (strongly hygienic) COMPONENTS
2. Think about the PATTERN of parallelism e.g. map(reduce), task farm, parallel search, parallel completion, ...
3. DISCOVERING parallelization opportunities (Patterns) turn pieces of code into concrete patterns (skeletons)
Take performance, energy etc. into account (multi-objective optimisation)
also using refactoring
4. RESTRUCTURE if necessary! (also using refactoring)
6
both legacy and
new programs
ARCOS
The RePhrase Approach
7
Initial
Application
Specification +
Requirements
Specification +
Pattern Structure
Patterned
Application
Pattern
Description
Library Pattern
DSL
Pattern
Implementation
Pattern
Discovery
DSL
Refactoring
Refactoring
Design
Requirements
Capture
Implementation
Verification
Program
Shaping
Existing/Legac
y Application
Specification +
Requirements
ARCOS
The RePhrase Approach
ARCOS
General Technique
Refactorer
C/C++ Erlang
Costing-
Profiling
C/C++ Erlang
Pattern
Library
AMD
Opteron
IBM
Power
Intel
Core
ARM
Core
ATI
GPU
Intel
GPU
Nvidia
GPU
Nvidia
Tesla
Intel
Xeon Phi
Haskell
Haskell
...
...
Java
Java
ARCOS
Refactoring
1. Refactoring changes the
structure of the source
code using well-defined rules
semi-automatically under
programmer guidance
Fully-automatic?
Review
ARCOS
A Parallel C++ Refactorer
1. Integrated into Eclipse
2. Supports full C++(11) standard
3. Uses strongly hygienic components functional encapsulation (closures)
4. Possibility to use different
parallel patterns
11
ARCOS
Speedup Results (demonstrators)
12
Speedup close to
or better than
manual
optimization
Refactoring pays
off manual
optimizations
ARCOS
Contributions from UC3M
Parallel Pattern interface
C++ threads OpenMP Intel TBB …
• GrPPI: A Generic and Reusable Parallel Pattern Interface • Data and stream parallel patterns
• C++ programming language • Generic programming (Template programming)
• Metaprogramming (Lambda expressions)
ARCOS
Stream parallelism patterns
• A first approach of the interface:
• Support for OpenMP, C++ Threads and Intel TBB
• Full support for stream parallelism patterns
• Pipeline, Farm, Filter and Stream-Reduce
ARCOS
• Interface:
• Example: Finding the maximum values in arrays
The Pipeline parallel pattern
ARCOS
• Interface:
• Example: Summing in parallel the values stored in files
The Farm parallel pattern
ARCOS
• Interface:
• Example: Filtering vectors with less than 10 elements
Stream parallelism patterns
ARCOS
• Interface:
• Example: Reducing a vector in parallel
The Strem-Reduce parallel pattern
ARCOS
• Evaluation of the usability and the performance of the parallel patterns:
• Target platform: 2x Intel Xeon Ivy Bridge E5-2695 (24 cores)
• Parallel technologies: C++11 threads, OpenMP and Intel TBB
• Benchmark: Stream video processing application
• Pipeline composed of 3 stages:
Read
video
frames
Gaussian
blur
filter
Write
video
frames
Sobel
operator
Thread #0 Thread #1 Thread #2 Thread #3
… …
SPSC
lock-free
queues
Experimental evaluation
Input
video
file
Output
video
file
…
ARCOS
• Evaluation using different compositions of Pipeline and Farm(s)
• Percentaje of increase of lines of code w.r.t. the sequential version
Experimental evaluation (cont’d)
ARCOS
• FPS with and without GrPPI with different frameworks and compositions
Experimental evaluation (cont’d)
ARCOS
Experimental evaluation (cont’d)
• Performance evaluation of the Filter and Stream-Reduce parallel patterns
• We use a synthetic version of the video processing application for filtering frames
• Filter: discard frames whose percentage of black pixels is above a threshold
• Reduce: sum the amount of null pixels
ARCOS
Other research at UC3M
• Implementation of parallel patterns using existing parallel frameworks
• GrPPI: A generic and Reusable Parallel Pattern Interface
• Discovering Parallel Patterns in source codes
• PPAT: Parallel Pattern Analyzer Tool
• Detection of catastrophic failures: deadlocks, data races, etc.
• Use of semantics to improve the detection of lock-free structures
• ThreadSanitizer as for the data race detector
ARCOS
Conclusions
• Most programming models are too low-level
concurrency based
need to expose mass parallelism
• Patterns hide away the complexity of parallel programming
GrPPI is an usable, simple, generic and highlevel parallel pattern interface
The overheads of GrPPI are negligible with respect to using directly parallel
programming frameworks
Parallelizing code with GrPPI only increases to 4.4% the number of lines of code
• Future work
Extend GrPPI with more stream and data parallel patterns: Map, Reduce or
MapReduce.
Support for other parallel programming frameworks: FastFlow
Accelerators with CUDA Thrust and OpenCL SYCL ?
THANK YOU!
http://rephrase-ict.eu
@rephrase_eu
http://paraphrase-ict.eu
26