seminar on multicore programming - aalto · seminar on multicore programming. context 1....

48
Legacy Code in a Multicore Environment 30.4.2009 Jari Karppinen Seminar on Multicore Programming

Upload: tranhanh

Post on 04-Jun-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Legacy Code in a Multicore Environment

30.4.2009

Jari Karppinen

Seminar on Multicore Programming

Page 2: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Context

1. Introduction2. Motivation3. Available tools 4. Parallelisation5. Case study6. Concluding Remarks7. Resources

Page 3: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Introduction

➢ Legacy code➢ Millions of lines and maintained > 10 year➢ Mostly written with C/C++➢ No possibility to rewrite with current resources➢ Often undocumented➢ Original developers have left the company➢ Might be running up on proprietary OS (no SMP support)

Page 4: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Introduction

➢ Legacy code parallelisation strategies➢ Auto parallelisation➢ Using multiple threads/processes➢ “Partitioning”➢ “Virtualisation”

Page 5: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Motivation

➔ Scaling by %serial code➔ Amdahl's Law➔ Karp-Flatt metric➔ parallelisation strategies

Page 6: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Scaling by %serial code

➢ Current legacy code scaling to multicore environment.➢ Only few % scales

well➢ 50 % does not scale

at all.

0 2 4 6 8 10 12

0

2

4

6

8

10

12

Scaling by %serial code

Perfect99.00%95.00%90.00%80.00%70.00%

60.00%50.00%

Scaling

Spee

dup

Motivation

Page 7: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Amdahl's Law

It does not take overhead or load balancing into account

100

5050

5050

100

100

100

100 100

Motivation

Page 8: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Amdahl's law

Performance = Time in Serial region + (Time in parallel region / Number of threads)

0 1 2 3 4 5 6 7 8 9 10

0

20

40

60

80

100

120

Serial Region

Paralle l Region

Time

Threads

Motivation

Page 9: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Karp-Flatt metric➢ Uses serial factor to take into account load balancing and overhead.

➢ Given a parallel computation exhibiting speed-up ψ on p processors, where p > 1, the experimentally determined serial fraction e is defined to be the Karp - Flatt Metric.

➔The less the value of e the better the parallelisation

Motivation

Page 10: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Karp-Flatt metric

Performance = Time in Serial region + (Time in parallel region / Number of threads)+ Synchronisation cost

0 1 2 3 4 5 6 7 8 9 10

0

20

40

60

80

100

120

Serial Region

Karp-Flatt metric

Amdahl's law

Time

Threads

Motivation

Page 11: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Parallelisation strategies

➢Separate tasks➢Multiple copies of the same task➢Task split over multiple threads➢Pipeline of tasks➢Client-server➢Producer-consumer model

Motivation

Page 12: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Available tools

➔Tools in general➔Execution analyser➔Thread analyser➔Performance analyser

Page 13: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Tools in general

➢Arbitrary parallelisation may not give the desired response.➢Estimation of legacy code parallelisation is difficult.➢Some tools are used to analysis code statically or dynamically to find out parts which are beneficial for parallelisation. ➢There exist also tools to ease problem finding after parallelisation.➢Many tools are vendor specific.

Available tools

Page 14: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Dynamic analysis tools

➢With dynamic analysis tools you can locate ex. Producer-consumer relations form code. ➢This is done with tracking the memory locations where other blocks write and other read.➢This information can be used when creating pipeline parallelism. ➢Pipeline parallelism is suitable to both new and existing programs.

Available tools

Page 15: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Performance analyzer

➢The purpose of profiling the execution of an application is to find the hotspots of that application.➢The hotspots are indicators of where attention needs to be spend in order to optimize the code.➢They are good candidates for threading, since these hotspots are going to be the most computationally intensive portions of the serial code.

Available tools

Page 16: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Coarce-Grained pipeline Parallelism in C program

➢Wiliam Thies and co. from Computer Science and Artificial Intelligence Laboratory has made a tool to locate producer-consumer relations from stream program.➢This tool analyses the application dynamically and it gives recommendations and macros which parts can be changed to the pipeline mode.

Available tools

Page 17: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Coarce-Grained pipeline Parallelism in C program

➢They have used several different kind of streams to evaluate the performance gain.➢GMTI, MPEG-2, MP3, 197.parser, 256.bzip2 and 456.hmmer➢Some of the parts of source code was needed to be changed thus the model did not support loops with break or continue inside.➢Speed up with 4 core was approximately 2.78x.

Available tools

Page 18: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Performance analyzer

➢A common Linux profiling tool is gprof. Actually it is a display tool for data collected during the execution of an application compiled and instrumented for profiling.➢The -pg option, used in the cc command, will instrument C code.➢Instrumented binary will generate a profile data file “gmon.out” when run.➢Gprof prints amount of time spent in the each function and time spent in child function calls.

Available tools

Page 19: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Thread analyzer

➢Data races are the most common cause of error in multi-threaded application.➢They are also hard to isolate because of non-deterministic scheduling of thread execution by the OS.➢Intel Thread Checker is a tool designed to identify data races, potential deadlocks, thread stalls and other threading errors.➢It does dynamic analysis when application executes.

Available tools

Page 20: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Thread analyzer

➢The Valgrind tool suite provides a number of debugging and profiling tools. ➢Helgrind is a Valgrind debugging tool for detecting synchronisation errors in C, C++ and Fortran programs that use the POSIX pthreads threading primitives. ➢Helgrind looks for various kinds of synchronisation errors in code that uses the POSIX PThreads API.

Available tools

Page 21: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Parallelisation

➔ Auto parallelisation➔ Using multiple threads/processes➔ Virtualisation➔ Partitioning

Page 22: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Auto parallelization

➢Having the compiler analyse the code and be able to determine if that code can be executed concurrently has been a research topic for many decades.➢Commutativity analysis for software parallelisation, research was done by Farhana Aleen and Nathan Clark in Georgia institute of technology.➢Unfortunately, there have not been too many breakthroughs in this field.

Parallelisation

Page 23: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Auto parallelisation

➢Compiler does the work - no source changes➢Easy to use for developer➢Loop-based parallelisation➢Is effective only for certain kind of applications➢Nature of C/C++ code is not feasible for auto parallelisation. Ex. Pointers behaviour are difficult to predict for compiler.

Parallelisation

Page 24: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Auto parallelisation

➢Compiler will print a report what loops were considered for parallelisation, the success of attempt and in the case of failure.➢Programmer can then analyse the given reason.➢In case of valid dependencies programmer can rewrite the loop to make them disappear ➢Report can be used also for advice on using OpenMP.

Parallelisation

Page 25: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Auto parallelisation

➢Dependence analysis is a static method to determine what dependencies exist between variables referenced within the loop body across iterations of the loop.➢If no cross-iterations data races can be shown within the loop, the iterations can be executed concurrently and the loop can be parallelised.

Parallelisation

Page 26: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Auto parallelisation example

For (int i=0; i<100000; i++){

a[i]=b[i]+c[i];

}

$ cc -o -xautopar -xloopinfo -xvpara loop.c

“Loop.c” line 3: PARALLELIZED, and serial version generated

Parallelisation

Page 27: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Using multiple threads/processes

➔SMP support ➔Timing➔ Workload imbalance➔ Spin locks & mutexes in legacy code➔ Deadlocks – avoidance➔ MT-Safe vs MT-hot➔ Workload imbalance➔ Hardware- trashing➔ Memory ordering

Parallelisation

Page 28: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

SMP support

➢ Operating system needs to support SMP➢ Software is needed to be divided to tasks➢ Communication between threads/processes is needed➢ messages➢ shared memory➢ barriers➢ condition variables➢ locking is needed➢ atomic operations

Parallelisation

Page 29: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Timing

➢Unless timing is enforced threads will progress at different rates➢Cannot rely on access pattern of serial code➢Cannot assume the time or order when threads will run

Parallelisation

Page 30: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Spin locks & mutexes in legacy code

➢ Mutex➢ Thread rescheduled when lock busy, woken up

when free➢ Consumes no processor resources waiting➢ More lock and unlock overhead

➢ Spin locks➢ Threads spin when lock busy➢ Consumes processor resources waiting for lock➢ Good for locks that are held for short times➢ In many cases multicore performance is lost when

processes spins while waiting some resources.

Parallelisation

Page 31: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Deadlocks - avoidance

➢Ideal multi-core SW would be lock free.➢This can not be reached or it is very challenging to program.➢It is needed to use Thread Analyser➢Avoid by always acquiring resources in the same order.➢When legacy code is running in multicore environment operating system is scheduling processes non-deterministic way and there fore previously existing hidden deadlocks pop up.

Parallelisation

Page 32: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

MT-Safe vs MT-hot

➢ MT-safe➢ Multiple threads can call it and it doesn't crash.➢ May serialise

➢ MT-hot➢ Multiple threads can call it with good performance.➢ Parallel algorithm

➢ Example➢ Default malloc --> MT-Safe➢ Mtmalloc --> MT-Hot

➢ Many operating system calls may serialize legacy application execution.

Parallelisation

Page 33: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Workload imbalance

➢Threads doesn't always perform the same task at same time. Then other thread needs to be wait the synchronization.

Parallelisation

Page 34: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Hardware- thrashing

➢Multiple cache lines mapping to the same cache entry➢With using Performance analyser this can be detected➢Then it can be located and fixed

Parallelisation

Page 35: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Posix in legacy code adaptation

➢ Advantages➢ User can have primitive control of parallesation

➢ Disadvantages➢ Increases complexity of code significantly➢ Finding competent persons will get difficult

Parallelisation

Page 36: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

OpenMP in legacy code adaptation

➢ Pros➢ Compiler does the work➢ Minimal source changes➢ Directive based➢ Can be compiled also for single thread to ease

debugging➢ You can incrementally parallise the region of interest

➢ Cons➢ Suitable only for certain type of applications➢ ex. control plane applications are hard to parallelize

Parallelisation

Page 37: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Virtualisation

➢ Can be used to run different applications on same processor in independent environment.

➢ Used mainly on server side➢ Different technologies

➢ Sun Hypervisor➢ Xen➢ VMware

Parallelisation

Page 38: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Partitioning

➢ Asymmetric configuration in multicore processor

➢ Different cores does different tasks➢ ex. in 8 core processor 4 cores can be allocated

for SMP and others run code in simple environments. No OS or light wait OS is used.

➢ More flexibility for software architecture

Parallelisation

Page 39: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Case study

➔Comments from people➔What has happened➔Case shared variable➔Case Ericsson

Page 40: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Comments from people

➢Another processor change➢Calculating MIPS is enough to estimate performance➢SMT will do the job for you➢Compiler will do the job for you

Case study

Page 41: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

What has happened

➢Many new faults will pop up➢Performance will decrease not increase➢Debugging will get difficult➢Whole SW architecture is needed to be change.

Case study

Page 42: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Case shared variable

➢ Simple two thread application worked without locking in multitasking operating system in single core processor. They shared one variable for reading and writing.

➢ This was possible because of sequence of application was performed.

➢ When same code was run in multi-core environment application was failing every time.

Case study

Page 43: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Case Ericsson

➢ Björn Lisper has studied Parallelisation of Legacy Telecom Software in joint research project with Ericsson.

➢ Most languages like C/C++ have a memory concept and thus statements must be executed in order.

➢ Only statements surely without dependences can be run in parallel.

➢ Pointers will make the situation even worse.

Case study

Page 44: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Case Ericsson

➢ He has find out that automatic parallelisation won't work in general.

➢ Software under inspection was server type telecom software. AXE: Ericsson classical telephone exchange.

➢ Software is event-driven and different job trees are typically concurrent, and can be executed in parallel if no conflicts exist.

Case study

Page 45: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Case Ericsson

➢ Automatic parallelisation of legacy code is a pipe dream in general.

➢ May work for special applications, which have enough inherent concurrency.

➢ Simple static conflict analysis was done. Results are promising but more research is needed.

Case study

Page 46: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Concluding Remarks

➢Performance is the main drive to parallelise the application. It it is not limiting factor then it is not needed to do anything.➢There is no point to try to parallelise everything.➢You need to recognize that part of code where is spend most of the processing time.➢You just need to continue living with legacy code.➢There are plenty of questions but rare answers.➢Good luck for everybody with legacy code!

Page 47: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Resources

1.Wikipedia. Amdahl's law. http://en.wikipedia.org/wiki/Amdahl%27s_law. Referred on 29.04.2009.2.Wikipedia. Karp-Flatt metric. http://en.wikipedia.org/wiki/Karp-Flatt_Metric. Referred on 29.04.2009.3.Valgrind. Manual. http://valgrind.org/docs/manual/hg-manual.html Referred on 30.04.2009.4.Thies W. and co. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Program. 5.Knafla B. and Leopold C. Parallelisation a Real-time streering simulation for computer games with OpenMP. John von Neumann institute for Computing, julich, NIC series Vol 38.6.Aleena F. Clark N. Commutativity for Software Parallelisation: Letting Program Transformations See the Big Picture. ASPLOS'09, March 7-11.2009.7.Lisper B. Parallelisation of Legacy Telecom Software. Multicore Days 12.09.2008.

Page 48: Seminar on Multicore Programming - Aalto · Seminar on Multicore Programming. Context 1. Introduction 2. ... MPEG-2, MP3, 197.parser, 256 ... Report can be used also for advice on

Resources

8. Developer portal Http://developer.sun.com9. Sun Studio http://developers.sun.com/studio10.Hill M. & Marty M. (2008) Amdahl's Law in the Multicore Era. IEEE

Computer 7. s. 33-38.11.Hughes C. & Hughes T. (2008) Professional Multicore Programming:

Design and implementation for C++ Developers. Willey, Indianapolis, United States of America. s. 621.

12.Domeika M. (2008) Software Development for Embedded Multi-core Systems. Elsevier Inc, United States if America.