the rephrase eu -project · the rephrase eu -project ... games ... 3 . thinking in parallel...

25
The RePhrase EU-Project Ongoing research at UC3M: Composable parallel patterns for stream parallelism Manuel F. Dolz, David del Rio, Javier Garcia-Blas, J. Daniel Garcia University Carlos III of Madrid NESUS Cost IC1305 Fifth Working group Meeting Ljubjana, July 8 th , 2016 ARCOS

Upload: vucong

Post on 15-Apr-2018

223 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

The RePhrase EU-Project

Ongoing research at UC3M:

Composable parallel patterns for stream parallelism

Manuel F. Dolz, David del Rio, Javier Garcia-Blas, J. Daniel Garcia

University Carlos III of Madrid

NESUS Cost IC1305 – Fifth Working group Meeting

Ljubjana, July 8th, 2016

ARCOS

Page 2: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

RePhrase Project: Refactoring Parallel Heterogeneous Software

– a Software Engineering Approach

(ICT-644235), 2015-2018, €3.6M budget

8 Partners, 6 European countries UK, Spain, Italy, Austria, Hungary, Israel

0 http://www.rephrase-ict.eu

Page 3: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

All future programming will be parallel

1. No future system will be single-core parallel programming will be essential

2. It’s ot just a out perfor a e it’s also a out e ergy usage

3. If e do ’t sol e the ulti ore halle ge, the o other ad a es will matter! user interfaces

cyber-physical systems

robotics

games

...

3

Page 4: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

Thinking in Parallel

Fundamentally, programmers must learn to think parallel

this requires new high-level programming constructs

you cannot program effectively while worrying about deadlocks etc.

they must be eliminated from the design!

you cannot program effectively while handling with communication

etc.

this needs to be packaged/abstracted!

you cannot program effectively without performance information

this needs to be included!

We use two key technologies:

Refactoring (changing the source code structure)

Parallel Patterns (high-level functions of parallel algorithms)

4

Page 5: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

Some Common Patterns

1. High-level abstract patterns of common parallel algorithms

5

Page 6: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

A Pattern-Based Approach

1. Start bottom-up identify (strongly hygienic) COMPONENTS

2. Think about the PATTERN of parallelism e.g. map(reduce), task farm, parallel search, parallel completion, ...

3. DISCOVERING parallelization opportunities (Patterns) turn pieces of code into concrete patterns (skeletons)

Take performance, energy etc. into account (multi-objective optimisation)

also using refactoring

4. RESTRUCTURE if necessary! (also using refactoring)

6

both legacy and

new programs

Page 7: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

The RePhrase Approach

7

Initial

Application

Specification +

Requirements

Specification +

Pattern Structure

Patterned

Application

Pattern

Description

Library Pattern

DSL

Pattern

Implementation

Pattern

Discovery

DSL

Refactoring

Refactoring

Design

Requirements

Capture

Implementation

Verification

Program

Shaping

Existing/Legac

y Application

Specification +

Requirements

Page 8: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

The RePhrase Approach

Page 9: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

General Technique

Refactorer

C/C++ Erlang

Costing-

Profiling

C/C++ Erlang

Pattern

Library

AMD

Opteron

IBM

Power

Intel

Core

ARM

Core

ATI

GPU

Intel

GPU

Nvidia

GPU

Nvidia

Tesla

Intel

Xeon Phi

Haskell

Haskell

...

...

Java

Java

Page 10: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

Refactoring

1. Refactoring changes the

structure of the source

code using well-defined rules

semi-automatically under

programmer guidance

Fully-automatic?

Review

Page 11: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

A Parallel C++ Refactorer

1. Integrated into Eclipse

2. Supports full C++(11) standard

3. Uses strongly hygienic components functional encapsulation (closures)

4. Possibility to use different

parallel patterns

11

Page 12: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

Speedup Results (demonstrators)

12

Speedup close to

or better than

manual

optimization

Refactoring pays

off manual

optimizations

Page 13: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

Contributions from UC3M

Parallel Pattern interface

C++ threads OpenMP Intel TBB …

• GrPPI: A Generic and Reusable Parallel Pattern Interface • Data and stream parallel patterns

• C++ programming language • Generic programming (Template programming)

• Metaprogramming (Lambda expressions)

Page 14: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

Stream parallelism patterns

• A first approach of the interface:

• Support for OpenMP, C++ Threads and Intel TBB

• Full support for stream parallelism patterns

• Pipeline, Farm, Filter and Stream-Reduce

Page 15: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

• Interface:

• Example: Finding the maximum values in arrays

The Pipeline parallel pattern

Page 16: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

• Interface:

• Example: Summing in parallel the values stored in files

The Farm parallel pattern

Page 17: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

• Interface:

• Example: Filtering vectors with less than 10 elements

Stream parallelism patterns

Page 18: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

• Interface:

• Example: Reducing a vector in parallel

The Strem-Reduce parallel pattern

Page 19: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

• Evaluation of the usability and the performance of the parallel patterns:

• Target platform: 2x Intel Xeon Ivy Bridge E5-2695 (24 cores)

• Parallel technologies: C++11 threads, OpenMP and Intel TBB

• Benchmark: Stream video processing application

• Pipeline composed of 3 stages:

Read

video

frames

Gaussian

blur

filter

Write

video

frames

Sobel

operator

Thread #0 Thread #1 Thread #2 Thread #3

… …

SPSC

lock-free

queues

Experimental evaluation

Input

video

file

Output

video

file

Page 20: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

• Evaluation using different compositions of Pipeline and Farm(s)

• Percentaje of increase of lines of code w.r.t. the sequential version

Experimental evaluation (cont’d)

Page 21: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

• FPS with and without GrPPI with different frameworks and compositions

Experimental evaluation (cont’d)

Page 22: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

Experimental evaluation (cont’d)

• Performance evaluation of the Filter and Stream-Reduce parallel patterns

• We use a synthetic version of the video processing application for filtering frames

• Filter: discard frames whose percentage of black pixels is above a threshold

• Reduce: sum the amount of null pixels

Page 23: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

Other research at UC3M

• Implementation of parallel patterns using existing parallel frameworks

• GrPPI: A generic and Reusable Parallel Pattern Interface

• Discovering Parallel Patterns in source codes

• PPAT: Parallel Pattern Analyzer Tool

• Detection of catastrophic failures: deadlocks, data races, etc.

• Use of semantics to improve the detection of lock-free structures

• ThreadSanitizer as for the data race detector

Page 24: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

ARCOS

Conclusions

• Most programming models are too low-level

concurrency based

need to expose mass parallelism

• Patterns hide away the complexity of parallel programming

GrPPI is an usable, simple, generic and highlevel parallel pattern interface

The overheads of GrPPI are negligible with respect to using directly parallel

programming frameworks

Parallelizing code with GrPPI only increases to 4.4% the number of lines of code

• Future work

Extend GrPPI with more stream and data parallel patterns: Map, Reduce or

MapReduce.

Support for other parallel programming frameworks: FastFlow

Accelerators with CUDA Thrust and OpenCL SYCL ?

Page 25: The RePhrase EU -Project · The RePhrase EU -Project ... games ... 3 . Thinking in Parallel Fundamentally, ... Support for OpenMP, C++ Threads and Intel TBB

THANK YOU!

http://rephrase-ict.eu

@rephrase_eu

http://paraphrase-ict.eu

26