1. project goals project description ◦ what is musepack? ◦ using multithreading approach ◦...

23
Musepack Encoder Performance Tuning Tal Rath and Eyal Enav May 2008 Technion Softlab 1

Post on 19-Dec-2015

231 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Musepack Encoder Performance Tuning

Tal Rath and Eyal EnavMay 2008

Technion Softlab1

Page 2: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Project goals Project description

◦ What is Musepack?◦ Using multithreading approach◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Results – Speedup overview Conclusions and recommendations Our benefits Next Steps

Agenda

2

Page 3: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

◦ Speeding up and optimizing a Musepack encoder while maintaining a bitwise output compatibility:

◦ Examining the encoder’s structure and methods.

Analyzing encoder functions time distribution using Intel’s Vtune program.

Apply multithreading, SIMD instructions and other techniques in order to achieve speedup using Vtune.

◦ Returning the code back to open source community.

Project goals

4

Page 4: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Project Platform: Intel Core 2 Duo,2.4Ghz,64 Bit, 2 GB of RAM. Windows XP OS.

Speedup measurement:

Project description

Original ExecutionTimeSpeedup

NewExecutionTime

5

Page 5: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

What is Musepack?

◦Musepack is an open source audio codec.

◦It is a lossy encoder.

◦Musepack has performed well in various listening tests at both lower and higher bitrates.

Project description

6

Page 6: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Thread Level Parallelism technique is used to reduce program execution time by executing multiple code sections on both cores simultaneously.

Amdahl’s law – if P is the proportion of parallel program, then the maximum speedup that can be achieved by using 2 processors is:

Therefore, P should be maximized.

Intel’s Vtune was used to target appropriate time consuming functions for multithreading.

Using Multithreading approach

2

1

1 P

8

Page 7: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Functions’ total timer events:

Psychoakustic_Modell’s time consumption is high, therefore, should be a target for multithreading.

Using Multithreading approach

9

Page 8: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Function contains two separate models with same instructions and different data.

Multithreading Psychoakustic Function – First Attempt

Each model

should be executed in a different

thread.

10

Page 9: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Problem: Very high dependency between models through local and global variables: Second model uses first one’s output.

Multithreading Psychoakustic – First attempt

11

Page 10: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Observation: Psychoakustic function contains left and right channel handling functions.

◦ These functions can be divided into two types:

◦ Single channel functions, for example:FunctionL(Left Param1,Left Param2,.., local param1,Local param2)

.

◦ Dual channel functions, for example:FunctionLR(Left Param1,Right Param1,…)

◦ Single channel functions does not access opposite channel’s local variables.

Timer events distribution: Single – 84% Dual - 16%

Multithreading Psychoakustic – Second attempt

12

Page 11: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Strategy:◦ One single channel function in each thread:

Multithreading Psychoakustic – Second attempt

Left

Right

Left LeftLeft

Time

Two Single channel functions

Two Single channel functions

Dual channel function

Dual channel function

Thread B

Thread A

13

Page 12: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Implementation:◦ Left channel local variables uses thread A while

right ones uses thread B. Shared variables, used by both threads, are being duplicated – one copy for each thread.

◦ Technical problem: Program contains a large amount of global variables.

◦ These are being accessed by both left and right single channel functions and supposed to be accessed from both threads simultaneously.

Multithreading Psychoakustic – Second attempt

A, About, ANSspec,_L ANSspec_,M ANSspec_,R ANSspec,_S, APE_Version, array, b, Bandwidth, Buffer, BufferBytes, BufferedBits, bump_exp, bump_start, Butfly, __C ,c, Ci_,opt ,CombPenalities, Cos,_Tab ,CosWin, CP_10000 ,CP_10079, CP_1250, CP_1251 CP_1252 CP_1253 CP_1254 CP_1255 CP_1256 CP_1257 CP_1258 CP_37, CP_42, CP_437 , CP_500, CVD_used, __D , d ,data,_finished DelInput DisplayUpdateTime

14

Page 13: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Solution - “Divide and Conquer approach”: Map all globals - Using globals marking script. Duplicate globals with which are being accessed by

functions in the deepest level of function call. After these functions are handled, proceed to a higher level. Process ends when the duplication of global variables, which

are being accessed from within the Upper level (Psychoakustic self code), is done.

Multithreading Psychoakustic- Global Variables

float g_var1 (global/static var)……Function A(){ g_var1 = value;}

Aligned 64 duplicated struct{float g_var1;}……Function A(thread num){ struct. g_var1 = value;}

Psychoakustic()

Deepest level

Upper level

15

aligned 64 structs (to avoid shared cache lines).

Page 14: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Multithreading - Results After Psychoakustic multithreading, two more functions have

been multithreaded, using the same mechanism.

Total threading speedup: 1.43X

◦ Parallel part: 73.2%.◦ Assuming serial part does not change, new exec time of multithreaded

part is 57% from it’s original time.

◦ Threading overhead:◦ Total program IC increased by 2.6%. Total timer event count increased by 0.62%. Intel Thread Checker found no errors.

(Thread Profiler) 16

Page 15: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Original encoder settings uses “Precise F.P. model” instead of “Fast mode F.P. model”.

Precise mode increases calculation time.

F.P. model was changed to “fast” (after consulting our instructors).

In the original program, sqrt instructions with single F.P. arguments was performed in double precision.

These instructions were changed to single precision.

Speedup gained so far: 1.77X

Output file has a bitwise compatibility only with original “Fast F.P. mode” file:◦ Around of value difference from “Precise mode” output is due to rounding.◦ Such minor differences can not be noticed by human ear.

Floating Point Issues

510 %

17

Page 16: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

SIMD is a technique employed to achieve data level parallelism, SIMD instructions enable the execution of 4 F.P. instructions at a time.

Function self time distribution:

Sqrt function is the main target for SIMD Instructions usage.

SIMD Instructions

18

Page 17: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

SIMD instructions were used in the four functions that call Sqrt instruction.

These functions were transformed into SIMD oriented functions – sqrt as well as other mathematical operations were performed by SIMD instructions.

In one of the functions, due to altering loop iteration number, Sqrt array was calculated in advance using SIMD instructions.

No calls to original Sqrt remained after applying SIMD.

SIMD Gained Speedup: 23% (With multithreading).

SIMD Instructions - implementation

19

Page 18: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Micro Architecture Issues Using VTune’s Tuning Assistance, several

micro architecture problems were discovered:

◦ RAT_STALLS.FLAGS – Indicates Partial flag stalls. About Events, each one causes ~10 cycles stalls ~4 sec. Possible solution: command substitution such as INC to ADD. Events occur in ‘fread’ function, therefore can not be modified.

◦ LOAD_BLOCK.OVERLAP_STORE – load instructions are blocked, Cause can be 4K (Page size) aliasing or load-store block overlap. Possible solution: increase 4K sized arrays by block size and use 64

Byte alignment. Solution was applied – Results are Unnoticeable.

910

20

Page 19: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

2.03

Speedup Overview

21

Page 20: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Multithreading◦ Can produce a significant program acceleration.

◦ Global variables can be an obstacle in the process of multithreading.

SIMD instructions ◦ Enhance speedup.

◦ Can be implemented only on specific code parts.

◦ Sometimes, implementation should be “creative”.

Micro architecture◦ In this Program no major problems were found.

◦ Vtune tuning assistance is a powerful tool for micro architecture problems tracking.

Conclusions

22

Page 21: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Making adjustments for quad core processor by creating 4 threads.

Designing a multithreading assistance program that will trace and handle global variables using suggested algorithm.

Optional Future Steps

23

Page 22: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

Improving our expertise for identifying the dominant factors in a process and handling it.

Enhancing our knowledge regarding multithreading technique.

Learning how to use SIMD instructions.

Being exposed to a few micro architecture problems.

Our Benefit

24

Page 23: 1.  Project goals  Project description ◦ What is Musepack? ◦ Using multithreading approach ◦ Applying SIMD ◦ Analyzing Micro-architecture problems

The EndThank you

25