fast static performance analysis of parallel …...fast static performance analysis of parallel...

13
Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin, boris.sedov, alexey.syschikov, vera.ivanova}@guap.ru Presenting: Sergey Pakharev

Upload: others

Post on 22-May-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,

Fast static performance analysis

of parallel program schemes

Yuriy Sheynin, Boris Sedov,

Alexey Syschikov, Vera Ivanova {sheynin, boris.sedov,

alexey.syschikov, vera.ivanova}@guap.ru

Presenting: Sergey Pakharev

Page 2: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,

Software for embedded systems and parallelism

2/13

20-24 April 2015 17th FRUCT Conference

For parallel software a very

important opportunity early to

assess the potential parallelism

and possible acceleration

depending on the number of

processors platform

Page 3: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,

Parallel program

3/13

20-24 April 2015 17th FRUCT Conference

VPL – visual programming language

Program on VPL – directed graph represented as block-schemes:

• vertices are the operators

• arcs are pointers, links operators

Page 4: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,

Early performance evaluation tool

4/13

20-24 April 2015 17th FRUCT Conference

Static analysis

• Evaluation of parallelism

and performance at early

stages

• Quick and “cheap” task

Complex performance analysis

Static analysis

Virtual simulator Platform simulator

Page 5: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,

5/13

Parallelism and data

20-24 April 2015 17th FRUCT Conference

Part of the program is parallel, the execution of such a program on the 2

processors must significantly reduce the total execution time. Real acceleration

of program execution < 1.5%.

The reason - the difference is the size of the

input data received on each parallel branch

program

Page 6: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,

6/13

Parallelism and data

adding matrix 𝑂 𝑛2

multiply matrix 𝑂 𝑛3

20-24 April 2015 17th FRUCT Conference

Some operators have asymptotic complexity that depends on the size of data

being processed.

For the analysis of the user specifies:

• Minimal data amount 𝑁𝑚𝑖𝑛

• Base data amount 𝑁𝑏𝑎𝑠𝑒

• Maximal data amount 𝑁𝑚𝑎𝑥

• Base time of program execution

𝐸𝑥𝑒𝑐𝐶𝑜𝑠𝑡𝑏𝑎𝑠𝑒

𝐸𝑥𝑒𝑐𝐶𝑜𝑠𝑡 = 𝐸𝑥𝑒𝑐𝐶𝑜𝑠𝑡𝑏𝑎𝑠𝑒

𝑂 𝑁𝑏𝑎𝑠𝑒∙ 𝑂 𝑁

Page 7: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,

7/13

Parallelism and data

20-24 April 2015 17th FRUCT Conference

Parallelism scheme decreases with increasing size of the matrix. The program is

not suitable for parallel platforms

• 𝑁𝑚𝑖𝑛 = 1

• 𝑁𝑚𝑎𝑥 = 15

• 𝑁𝑏𝑎𝑠𝑒 = 1

Page 8: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,

Hierarchy

8/13

12

3

20-24 April 2015 17th FRUCT Conference

VPL scheme program may also contain terminal blocks (data processing) and

composite operators (structural units)

Composite components are designed for

a hierarchical structuring of the program.

They may contain terminal operators and

other composite operators

Page 9: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,

Hierarchy

9/13

1 2 3complex

node

1 2 3P1

P2

1

2

3

complex

node

1

2

3P1

P2

20-24 April 2015 17th FRUCT Conference

Model performance composite structures:

Fully sequential

• all nodes in the body of the

compound statement are placed

on one processor

Fully parallel

• all nodes in the body of a

compound operator placed all

available processors by the

general rules

Page 10: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,

Hierarchy

10/13

Sequential model Parallel model

C1

C2

F1 F2

F3

F4P1

P2C2

F1 F2

F3

F4P1

P2C1

t=700 t=600

20-24 April 2015 17th FRUCT Conference

Page 11: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,

11/13

Iterations

For

While

F1

F2 F3 F4

P1

P2

20-24 April 2015 17th FRUCT Conference

Most of the computing in the program are presented as conditional (while) or

iterative (for) loops, they have a significant impact on the performance of the

program.

• The asymptotic complexity of the loop body

• The number of iterations

• Execution model (parallel / sequential)

Loop execution time = accumulated time execution of the body * number of

iterations

Page 12: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,

12/13

Conclusion

20-24 April 2015 17th FRUCT Conference

Static analyzer of parallel VPL programs provides:

• Evaluation of the program speedup on a different number of processors

• Evaluation of parallelism deviations depending on data amount and

processing operators complexity

• Evaluation includes aspects of the program hierarchy and loops

Further areas of work :

• Accounting features conditional statements (if/switch)

• Implementation of deeper analysis with virtual and platform simulator

Page 13: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,

13/13

Thank you!

20-24 April 2015 17th FRUCT Conference