turning’control’flow’graphs’into’ callgraphs · 12.04.2012  ·...

34
Turning Control Flow Graphs into Callgraphs Transforma5on of par55oned codes for execu5on in heterogeneous architectures PABLO BARRIO TOBIAS KENTER CARLOS CARRERAS CHRISTIAN PLESSL ROBERTO SIERRA

Upload: others

Post on 19-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Turning  Control  Flow  Graphs  into  Callgraphs  

 Transforma5on  of  par55oned  codes  for  execu5on  in  heterogeneous  architectures  

PABLO  BARRIO  TOBIAS  KENTER  

CARLOS  CARRERAS  CHRISTIAN  PLESSL  ROBERTO  SIERRA  

Page 2: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Outline

2  Turning  CFGs  into  callgraphs  

1.  Heterogeneous  High  Performance  Compu;ng  

 2.  Compila;on  toolchain    

3.  Code  refactoring  for  execu;on  in  heterogeneous  pla?orms  

Page 3: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Outline

3  Turning  CFGs  into  callgraphs  

1.   Heterogeneous  High  Performance  Compu5ng  

 2.  Compila;on  toolchain    

3.  Code  refactoring  for  execu;on  in  heterogeneous  pla?orms  

Page 4: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

High Performance Computing & Embedded Systems

4  Turning  CFGs  into  callgraphs  

but  geBng  closer  every  day…  

Embedded   HPC  

Type  of  processors   Heterogeneous   Homogeneous  

Size   Small   Massive  

Memory   Shared   Distributed  

Page 5: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Objectives

5  Turning  CFGs  into  callgraphs  

•   A  code  par55oner  for  heterogeneous  architectures.    •   Easy  to  add  models  for  new  devices  and  architectures.  

•   Par;;oning  based  on  soLware  and  hardware  characteris;cs.    •   Communica;ons  generated  for  distributed  memory  systems.    •   Automa5c  paralleliza5on,  both  func;onal  and  data  parallel.  

Page 6: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

The solution under research

6  Turning  CFGs  into  callgraphs  

FRONT  END  

op;miza;on  passes  

BACK  END  

C,  C++,  Fortran…    

LLVM  IR    

LLVM  IR    

asm,  VHDL…    

Profiling    Es;ma;on    Par;;oning  

&  Mapping  

 Code  refactoring  

 

Page 7: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Outline

7  Turning  CFGs  into  callgraphs  

1.  Heterogeneous  High  Performance  Compu;ng  

 2.   Compila5on  toolchain    

3.  Code  refactoring  for  execu;on  in  heterogeneous  pla?orms  

Page 8: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

LLVM-based compilation toolchain

8  Turning  CFGs  into  callgraphs  

Module  1  

Module  N  

.  

.  

.  

Es;ma;on  

Linked  module  

Front  ends  

opt  &  link  

Profiling?   lli  

Profile  info  

RSD  file  Par;;oning  &  mapping  

Code  refactoring  

Module  1  

Module  M  

.  

.  

.  

Back  end  1  

Back  end  M  

Exe1  

ExeM  

.  

.  

.  

.  

.  

.  

yes  

no  

Data  in  

Page 9: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Partitioning & Mapping

9  Turning  CFGs  into  callgraphs  

[PartitioningPass] PARTITIONING OVERVIEW: Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00

[PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU 4 --> CPU beforeHeader --> CPU 5 --> CPU 6 --> CPU 7 --> CPU 8 --> CPU_SIMD 9 --> CPU_SIMD 11 --> CPU_SIMD 12 --> CPU_SIMD 13 --> CPU 14 --> CPU afterHeader --> CPU

...

Page 10: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Outline

10  Turning  CFGs  into  callgraphs  

1.  Heterogeneous  High  Performance  Compu;ng  

 2.  Compila;on  toolchain    

3.   Code  refactoring  for  execu5on  in  heterogeneous  plaQorms  

Page 11: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Function-based control flow

11  Turning  CFGs  into  callgraphs  

BB_A: ... jmp BB_B

BB_B: ... jne A, C

BB_C: ...

BB_A: ... call B() ret

BB_B: ... jne callA, callC

BB_C: ...

callA: ... call A() ret

callA: ... call A() ret

callC: ... call C() ret

A()

B()

C()

Page 12: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Refactoring methodology

12  Turning  CFGs  into  callgraphs  

duplicate  constants    distribute  globals    for  every  original  func;on  f    

 ini;atorList  ←  find  ini;ators(f)    

 create  new  func;ons(f,  ini;atorList)    

 fix  branches(ini;atorList)    

 fix  phi  nodes(ini;atorList)  

Page 13: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Refactoring methodology

13  Turning  CFGs  into  callgraphs  

duplicate  constants    distribute  globals    for  every  original  func;on  f    

 ini5atorList  ←  find  ini5ators(f)    

 create  new  func;ons(f,  ini;atorList)    

 fix  branches(ini;atorList)    

 fix  phi  nodes(ini;atorList)  

Page 14: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Initiator list ← find initiators(f)

14  Turning  CFGs  into  callgraphs  

Par;;oning  result   Ini;ators   Resul;ng  func;ons  

entry    T      F  

if.end    T      F  

for.cond.preheader  

for.cond18.loopexit  

for.body      T              F  

if.then  

for.body20          T              F  

for.end26  

return  

entry    T      F  

if.end    T      F  

for.cond.preheader  

for.cond18.loopexit  

for.body      T              F  

if.then  

for.body20          T              F  

for.end26  

return  

entry    T      F  

if.end    T      F  

for.cond.preheader  

for.cond18.loopexit  

for.body      T              F  

if.then  

for.body20          T              F  

for.end26  

return  

Page 15: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Refactoring methodology

15  Turning  CFGs  into  callgraphs  

duplicate  constants    distribute  globals    for  every  original  func;on  f    

 ini;atorList  ←  find  ini;ators(f)    

 create  new  func5ons(f,  ini5atorList)    

 fix  branches(ini;atorList)    

 fix  phi  nodes(ini;atorList)  

Page 16: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

BB_A: %3 = add i32 %2, 6 jmp BB_B

BB_B: %4 = mul i32 %3, %3 jne BB_A, BB_C

BB_C: ret call i32 @puts(%num)

DEV  1  

DEV  2  

i32 f(i8* %num)

create new functions (f, initiatorList)

16  Turning  CFGs  into  callgraphs  

MODULE  1   MODULE  2  

declare i32 @puts(i8*)

Declare  used  func;ons  in  the  

des;na;on  module  

Page 17: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

BB_A: %3 = add i32 %2, 6 jmp BB_B

BB_B: %4 = mul i32 %3, %3 jne BB_A, BB_C

BB_C: ret call i32 @puts(%num)

DEV  1  

DEV  2  

i32 f(i8* %num)

Splitting functions

17  Turning  CFGs  into  callgraphs  

MODULE  1   MODULE  2  

declare i32 @puts(i8*)

i32 f2(i8* %arg1, i32 %arg2)

Create  new  func;on  prototype  

Page 18: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

BB_A: %3 = add i32 %2, 6 jmp BB_B

BB_B: %4 = mul i32 %3, %3 jne BB_A, BB_C

BB_C: ret call i32 @puts(%num)

i32 f(i8* %num)

create new functions (f, initiatorList)

18  Turning  CFGs  into  callgraphs  

MODULE  1   MODULE  2  

declare i32 @puts(i8*)

i32 f2(i8* %arg1, i32 %arg2)

Move  Basic  Blocks  

Page 19: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

BB_A: %3 = add i32 %2, 6 jmp BB_B

BB_B: %4 = mul i32 %arg2, %arg2 jne BB_A, BB_C

BB_C: ret call i32 @puts(%arg1)

i32 f(i8* %num)

create new functions (f, initiatorList)

19  Turning  CFGs  into  callgraphs  

MODULE  1   MODULE  2  

declare i32 @puts(i8*)

i32 f2(i8* %arg1, i32 %arg2)

Fix  argument  uses  

Page 20: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Refactoring methodology

20  Turning  CFGs  into  callgraphs  

duplicate  constants    distribute  globals    for  every  original  func;on  f    

 ini;atorList  ←  find  ini;ators(f)    

 create  new  func;ons(f,  ini;atorList)    

 fix  branches(ini5atorList)    

 fix  phi  nodes(ini;atorList)  

Page 21: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

BB_A: %3 = add i32 %2, 6 %r = call i32 f2(%num, %3) ret %r

BB_B: %4 = mul i32 %arg2, %arg2 jne fcaller, BB_C

BB_C: ret call i32 @puts(%arg1)

i32 f(i8* %num)

fix branches (initiatorList)

21  Turning  CFGs  into  callgraphs  

MODULE  1   MODULE  2  

declare i32 @puts(i8*)

i32 f2(i8* %arg1, i32 %arg2)

Replace  old  branches  by  func;on  calls  

fcaller: %r = call i32 f(%num, %3) ret %r

Page 22: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Refactoring methodology

22  Turning  CFGs  into  callgraphs  

duplicate  constants    distribute  globals    for  every  original  func;on  f    

 ini;atorList  ←  find  ini;ators(f)    

 create  new  func;ons(f,  ini;atorList)    

 fix  branches(ini;atorList)    

 fix  phi  nodes(ini5atorList)  

Page 23: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

STACK  B  

BB_A: ... jmp BB_B

BB_B: ... jne A, C

BB_C: ...

BB_A: ... call B() ret

BB_B: ... jne callA, callC

BB_C: ...

callA: ... call A() ret

callC: ... call C() ret

A()

B()

C()

Loops generate recursive calls

23  Turning  CFGs  into  callgraphs  

STACK  A  

stack  limit  

.  .   .  

.  .   .  vars  

ret  

ret  

vars  

vars  

ret  

ret  

vars  

vars  

ret  

ret  

vars  

vars  

ret  

ret  

vars  

vars  

ret  

ret  

vars  

vars  

ret  

Page 24: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Fixing loop recursion: a loop pass

24  Turning  CFGs  into  callgraphs  

header: %3 = add i32 %2, 6 br label %latch

latch: %4 = mul i32 %3, %3 %cond = icmp ne %4, 0 br i1 %cond, label %header, label %exit

exit: ret call i32 @puts(%num)

header: %cond = load i1* %cmpRes br i1 %cond, label %postheader, label %”exit”

latch: %4 = mul i32 %3, %3 %cond = icmp ne %4, 0 store i1 %cond, i1* cmpRes br label %header

exit: ret call i32 @puts(%num)

preheader: %cmpRes = alloca i1 store i1 true, i1* %cmpRes br label %header

postheader: %3 = add i32 %2, 6 jmp label %latch

Page 25: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Fixing loop recursion: final code refactoring

25  Turning  CFGs  into  callgraphs  

header: %cond = load i1* %cmpRes br i1 %cond, label %postheader, label %”exit”

latch: %4 = mul i32 %3, %3 %cond = icmp ne %4, 0 store i1 %cond, i1* cmpRes br label %header

exit: ret call i32 @puts(%num)

preheader: %cmpRes = alloca i1 store i1 true, i1* %cmpRes br label %header

postheader: %3 = add i32 %2, 6 jmp label %latch

DEV  1  

DEV  2  

header: %cond = load i1* %cmpRes br i1 %cond, label %postheader, label %”cal”

latch: %4 = mul i32 %3, %3 %cond = icmp ne %4, 0 store i1 %cond, i1* cmpRes ret

exit: ret call i32 @puts(%num)

preheader: %cmpRes = alloca i1 store i1 true, i1* %cmpRes br label %header

postheader: %3 = add i32 %2, 6 call latch() br label %header

f()

latch()

exit()

cal: call exit()

Page 26: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Output from the tool

26  Turning  CFGs  into  callgraphs  

Time profiling hello.ir [HPCmap] Parsing module hello.ir... [ReadArchPass] Parsing architecture ../architectures/CPU_SIMD.arch... [EstimationPass] Estimating from profiling information... [PartitioningPass] PARTITIONING OVERVIEW: [PartitioningPass] Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00 [LoopRecursionBreakPass] Analyzing loop 5 <-> 12 [PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original module's functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU

... PartitionWriterPass::find_initiators() -- Inspecting function main()

Trivial initiators: 5 8 Entry block initiator: entry Nontrivial initiators: 14

... PartitionWriterPass::create_new_Fs() -- Splitting up function main

Function main1_CPU inserted in module CPU.part Moving BB 14 from function main to function main1_CPU

... PartitionWriterPass::branches_to_fcalls() -- Fixing branches:

to BB entry, moved to function main to BB 14, moved to function main1_CPU

PartitionWriterPass::fix_initiator_phis() -- Initiators: main2_CPU::5 2 phis updated

[PartitionWriterPass] Module CPU.part generated [PartitionWriterPass] Module CPU_SIMD.part generated Partitioned hello.ir

Page 27: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Preliminary results

27  Turning  CFGs  into  callgraphs  

0

1

2

3

4

5

ctrl arrayops adi floyd-warshall

Exec

utio

n tim

e (s

)original

partitionedvectorized

part + full vectpart + semi vect

Page 28: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Conclusions

28  Turning  CFGs  into  callgraphs  

•   Compila;on  toolchain  for  heterogeneous  architectures  

•   Code  refactoring  based  on  spliBng  func;ons  into  smaller  ones.  

•   Removed  recursion  generated  by  loops  being  transformed  into  func;ons.  

•   The  func;on  call  approach  does  not  introduce  a  significant  overhead  so  far.  

Page 29: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Work in progress…

29  Turning  CFGs  into  callgraphs  

IN  THE  REFACTORING  PASS    

 Execute  in  a  real  architecture  (one  executable  per  device)    Distributed  memory    Automa;c  communica;ons  

   IN  THE  COMPLETE  TOOLCHAIN    

 Iden;fica;on  of  parallelism    Data  par;;oning    Improve  es;ma;on,  par;;oning  heuris;cs,  profiling…  

Page 30: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Time profiling hello.ir [HPCmap] Parsing module hello.ir... [ReadArchPass] Parsing architecture ../architectures/CPU_SIMD.arch... [EstimationPass] Estimating from profiling information... [PartitioningPass] Partitioning... [PartitioningPass] PARTITIONING OVERVIEW: [PartitioningPass] Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00 [LoopRecursionBreakPass] Analyzing loop 9 <-> 9 [LoopRecursionBreakPass] DONE [LoopRecursionBreakPass] Analyzing loop 6 <-> 6 [LoopRecursionBreakPass] DONE [PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original module's functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU beforeHeader --> CPU 8 --> CPU_SIMD 9 --> CPU_SIMD 13 --> CPU afterHeader --> CPU puts with BBs:

PartitionWriterPass::find_initiators() -- Inspecting function main() Trivial initiators: 5 11 Entry block initiator: entry Nontrivial initiators: 14 Results: entry has initiator entry beforeHeader has initiator entry 5 has initiator 5 11 has initiator 11 12 has initiator 11

[PartitionWriterPass] Module CPU.part generated [PartitionWriterPass] Module CPU_SIMD.part generated Partitioned hello.ir

Turning  CFGs  into  callgraphs  

Page 31: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Time profiling hello.ir [HPCmap] Parsing module hello.ir... [ReadArchPass] Parsing architecture ../architectures/CPU_SIMD.arch... [EstimationPass] Estimating from profiling information... [PartitioningPass] Partitioning... [PartitioningPass] PARTITIONING OVERVIEW: [PartitioningPass] Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00 [LoopRecursionBreakPass] Analyzing loop 9 <-> 9 [LoopRecursionBreakPass] DONE [LoopRecursionBreakPass] Analyzing loop 6 <-> 6 [LoopRecursionBreakPass] DONE [PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original module's functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU beforeHeader --> CPU 8 --> CPU_SIMD 9 --> CPU_SIMD 13 --> CPU afterHeader --> CPU puts with BBs:

PartitionWriterPass::find_initiators() -- Inspecting function main() Trivial initiators: 5 11 Entry block initiator: entry Nontrivial initiators: 14 Results: entry has initiator entry beforeHeader has initiator entry 5 has initiator 5 11 has initiator 11 12 has initiator 11

[PartitionWriterPass] Module CPU.part generated [PartitionWriterPass] Module CPU_SIMD.part generated Partitioned hello.ir

Turning  CFGs  into  callgraphs  

Page 32: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Time profiling hello.ir [HPCmap] Parsing module hello.ir... [ReadArchPass] Parsing architecture ../architectures/CPU_SIMD.arch... [EstimationPass] Estimating from profiling information... [PartitioningPass] Partitioning... [PartitioningPass] PARTITIONING OVERVIEW: [PartitioningPass] Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00 [LoopRecursionBreakPass] Analyzing loop 9 <-> 9 [LoopRecursionBreakPass] DONE [LoopRecursionBreakPass] Analyzing loop 6 <-> 6 [LoopRecursionBreakPass] DONE [PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original module's functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU beforeHeader --> CPU 8 --> CPU_SIMD 9 --> CPU_SIMD 13 --> CPU afterHeader --> CPU puts with BBs:

PartitionWriterPass::find_initiators() -- Inspecting function main() Trivial initiators: 5 11 Entry block initiator: entry Nontrivial initiators: 14 Results: entry has initiator entry beforeHeader has initiator entry 5 has initiator 5 11 has initiator 11 12 has initiator 11

[PartitionWriterPass] Module CPU.part generated [PartitionWriterPass] Module CPU_SIMD.part generated Partitioned hello.ir

Turning  CFGs  into  callgraphs  

Page 33: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Time profiling hello.ir [HPCmap] Parsing module hello.ir... [ReadArchPass] Parsing architecture ../architectures/CPU_SIMD.arch... [EstimationPass] Estimating from profiling information... [PartitioningPass] Partitioning... [PartitioningPass] PARTITIONING OVERVIEW: [PartitioningPass] Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00 [LoopRecursionBreakPass] Analyzing loop 9 <-> 9 [LoopRecursionBreakPass] DONE [LoopRecursionBreakPass] Analyzing loop 6 <-> 6 [LoopRecursionBreakPass] DONE [PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original module's functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU beforeHeader --> CPU 8 --> CPU_SIMD 9 --> CPU_SIMD 13 --> CPU afterHeader --> CPU puts with BBs:

PartitionWriterPass::find_initiators() -- Inspecting function main() Trivial initiators: 5 11 Entry block initiator: entry Nontrivial initiators: 14 Results: entry has initiator entry beforeHeader has initiator entry 5 has initiator 5 11 has initiator 11 12 has initiator 11

[PartitionWriterPass] Module CPU.part generated [PartitionWriterPass] Module CPU_SIMD.part generated Partitioned hello.ir

Turning  CFGs  into  callgraphs  

Page 34: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Time profiling hello.ir [HPCmap] Parsing module hello.ir... [ReadArchPass] Parsing architecture ../architectures/CPU_SIMD.arch... [EstimationPass] Estimating from profiling information... [PartitioningPass] Partitioning... [PartitioningPass] PARTITIONING OVERVIEW: [PartitioningPass] Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00 [LoopRecursionBreakPass] Analyzing loop 9 <-> 9 [LoopRecursionBreakPass] DONE [LoopRecursionBreakPass] Analyzing loop 6 <-> 6 [LoopRecursionBreakPass] DONE [PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original module's functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU beforeHeader --> CPU 8 --> CPU_SIMD 9 --> CPU_SIMD 13 --> CPU afterHeader --> CPU puts with BBs:

PartitionWriterPass::find_initiators() -- Inspecting function main() Trivial initiators: 5 11 Entry block initiator: entry Nontrivial initiators: 14 Results: entry has initiator entry beforeHeader has initiator entry 5 has initiator 5 11 has initiator 11 12 has initiator 11

[PartitionWriterPass] Module CPU.part generated [PartitionWriterPass] Module CPU_SIMD.part generated Partitioned hello.ir

Turning  CFGs  into  callgraphs