static transformation for heap layout using memory … code transform total specint 2 000 175.vpr...

33
Static Transformation for Heap Layout Using Memory Access Patterns Jinseong Jeon Computer Science, KAIST

Upload: doxuyen

Post on 23-May-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

Static Transformation for Heap Layout Using Memory Access Patterns

Jinseong Jeon Computer Science, KAIST

Page 2: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 2

Static Transformation

computing

machine compiler

+ static transformation

user

•  Compilers can transform program memory layout. –  program behaviors: memory access patterns –  machine properties: memory hierarchy

Page 3: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 3

Heap Layout Transformation [ Pool Allocation ] - complex pointer analysis

[ Field Layout Reconstruction ] - profiling

Node { int key; char data[6]; Node *next; } * T; char* search(int k) { ... while (...) { if (h→key == k) return h→data; h = h→next; } ... }

k n

...

d ...

Page 4: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 4

Goal & Direction

•  To build static transformation for heap layout –  Based on both heap layout transformations

•  Predict program behaviors –  How to represent memory access behaviors

•  Regular expressions

–  How to extract run-time behaviors from codes •  Code → CFG → Automaton → R.E.

•  Then, apply optimizing techniques –  How to interpret predicted behaviors

Page 5: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 5

Overview

Structure Selection Analysis

Field Affinity Analysis

Access Pattern Analysis

Layout Transformer

Sourcecode

Optimizedcode

Page 6: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 6

Structure Selection

S1.x = T1.c; for (...) { Ti.a = ...; ... = Ti.b; Uj.y = ...; }

S = T ; for ( ) { T = ; = T ; U = ; }

TS

TTU

TS(TTU)*

T U

conversion

candidate selection for pool allocation

structure type projection

Page 7: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 7

Field Affinity Estimation

= .c; for ( ) { .a = ; = .b; }

c

ab

c(ab)*

a

c

b*

a b

...

c ...

field usage projection conversion

symbolic estimation field layout reconstruction

S1.x = T1.c; for (...) { Ti.a = ...; ... = Ti.b; Uj.y = ...; }

Page 8: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 8

Field Affinity Estimation

•  Symbolic approach –  record closure marks with nesting information

–  regard all closure marks as a same variable

n

k

d

**

**

**

*

*

***

**

n

k

d

2x2+3x3x

x

x2

(kdn(n)*)* ((kn)*(kd+))* ((kn)*(kd+))*

Page 9: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 9

Code Transformation •  Explicit field names → field accesses on modified layouts

–  Oi.next is converted into *(Oi + offset(next)).

–  Random pointer dereferences like *(p + 4) are not allowed.

–  For some accesses, extra instructions are required.

Page 10: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 10

Code Transformation •  Type-aware malloc → pool allocation routines

–  For custom allocators, feed hints which consist of target structures and corresponding custom allocators

...

... = malloc(sizeof(T)); ...

...

... = _T_alloc_(); ...

char* my_malloc(int s) { ... ... = _T_alloc(); }

char* _T_alloc_() { // pool allocation }

...

... = my_malloc(sizeof(T)); ...

char* my_malloc(int s) { ... ... = malloc(s); }

...

... = my_malloc(sizeof(T)); ...

char* _T_alloc_() { // pool allocation }

Page 11: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 11

Overview

Structure Selection Analysis

Field Affinity Analysis

Access Pattern Analysis

Layout Transformer

Sourcecode

Optimizedcode

Page 12: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 12

Experimental Environment

•  Using the CIL compiler and OCaml

•  Redhat 9.0 Linux PC –  2.6GHz Pentium4 processor –  8KB L1D cache, 512KB L2 cache, 1.7GB main memory

•  GCC 3.2.2 with -O3

Page 13: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 13

Analysis Time Benchmark Program

Lines of Code

Structure Selection

Field Affinity

Code Transform

Total

SPECINT 2000

175.vpr 300.twolf

11301 17821

7.220 15.598

0.324 3.455

0.107 1.126

7.651 20.179

FreeBench analyzer 763 0.096 0.027 0.012 0.135

McGill chomp misr

378 181

0.021 0.003

0.006 0.002

0.003 0.001

0.030 0.006

Olden suite

bisort health

mst

perimeter

treeadd

tsp

voronoi

597 474 408

345 154 433 975

0.020 0.024 0.031

0.012 0.002 0.011 0.048

0.003 0.004 0.004

0.012 0.000 0.004 0.004

0.002 0.002 0.002

0.001 0.000 0.002 0.003

0.025 0.030 0.037

0.025 0.002 0.017 0.055

Ptrdist suite

anagram bc

ft

ks

355 4303 926

551

0.031 2.028 0.050

0.055

0.003 0.634 0.014

0.012

0.001 0.193 0.010

0.020

0.035 2.855 0.074

0.087

Page 14: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 14

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

175.vpr

300.twolf

analyzer

chomp

misr

bisort

health

mst

perimeter

treeadd

tsp

voronoi

anagram

bc ft ks

Nor

mal

ized

L1D

cac

he m

iss

(1.0

= O

rigi

nal)

Pool

Pool + Re

Cache Miss - L1D 1.99 2.23

Pool 0.86

0.84 Pool + Re

Page 15: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 15

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

175.vpr

300.twolf

analyzer

chomp

misr

bisort

health

mst

perimeter

treeadd

tsp

voronoi

anagram

bc ft ks

Nor

mal

ized

L2

cach

e m

iss

(1.0

= O

rigi

nal)

Pool

Pool + Re

Cache Miss - L2 4.10 4.18

Pool 1.06

1.00 Pool + Re

Page 16: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 16

Performance Benchmark Program

Lines of Code

Original (second)

Pool / Original

Pool + Re / Original

SPECINT 2000

175.vpr 300.twolf

11301 17821

10.959 435.19

1.01 0.98

1.01 0.99

FreeBench analyzer 763 66.64 0.41 0.45

McGill chomp misr

378 181

7.44 31.39

0.59 0.99

0.47 1.01

Olden suite

bisort health

mst

perimeter

treeadd

tsp

voronoi

597 474 408

345 154 433 975

24.29 86.05 65.73

7.19 10.17 20.44 11.03

0.99 0.71 0.82

0.78 0.48 0.96 0.99

0.99 0.63 0.82

0.84 0.55 0.97 0.99

Ptrdist suite

anagram bc

ft

ks

355 4303 926

551

1.53 1.95 8.25

7.46

0.99 0.82 0.83

1.03

1.11 0.81 0.73

1.03

Avg. 0.84 0.84

Page 17: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 17

Contribution

•  Predict memory access patterns at compile-time –  Regular expressions

–  Automata reduction algorithm

•  Interpret predicted patterns

according to heap layout transformations

•  Cache misses are reduced by 16%

•  Execution times are reduced by 14%

Page 18: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

Backup Slides

Page 19: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 19

From CFG to Automaton

start

return

h == NULL

h→key == k

h = h→nexth→data

NotFound

T F

T F k

d n

Page 20: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 20

State Elimination

e

ae*c

be*d

Page 21: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 21

From Automaton to R.E.

k

d n

k

d

nk

nk

kd+e kd+e

kn

(kn)*(kd+e)

(kn)*(kd+)

Page 22: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 22

State Compare

state_compare(state s1, state s2) b1 Ã whether 9s’.(s’ → s1, s1.dfn ≤ s’.dfn) // 0 or 1 b2 Ã whether 9s’.(s’ → s2, s2.dfn ≤ s’.dfn) // 0 or 1 if b1 and not b2 then 1 // s1 > s2 else if not b1 and b2 then -1 // s1 < s2 else if b1 and b2 then compare(s2.dfn, s1.dfn) // dfn = Depth First Numbering else compare(s1.dfn, s2.dfn) end if

Page 23: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 23

Automata Reduction

worklist à ; workhorse(state s) if s ≠ start state and s ≠ end state then for all s’ 2 s.successor do delete s’ from worklist end for eliminate(s) for all s’ 2 s.successor do push s’ into worklist end for end if

Page 24: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 24

Automata Reduction

reduce() E à {s 2 S | 9 s’.s →ε s’} R à {s 2 E | @ s’.s’ → s, s.dfn ≤ s’.dfn} for all s 2 R do workhorse(s) end for worklist à S\R while worklist ≠ ; do workhorse(pop(worklist)) end while

Page 25: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 25

From Intra- to Inter-proc.

b

a

b

a

f()

•  Intrinsically, reverse topological order of a call graph •  For self-recursive function calls,

f() { ... = s.a; if (!end) f(); ... = s.b; }

a*abb*

F → ab | aFb

aibi

Page 26: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 26

Structure Selection

•  “One structure per pool” –  Most pools are used in a type-consistent manner

•  Identify which structures are exhaustively used –  Structure access patterns –  Repeatedly used ones

•  Structure detection in closures

Page 27: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 27

Closure Detection

•  Presence of closures –  EMPTY, NORMAL, HAVE

. . foo(); . .

. . bar1(); . .

. . while(..) bar2(); . .

. . s->f1; s->f2; . .

main foo bar1 bar2

bar2 x NORMALbar1 x HAVEbar2 x NORMAL

foo x HAVEbar1 x HAVEbar2 x NORMAL

main x HAVEfoo x HAVEbar1 x HAVEbar2 x NORMAL

exc. exc.

Page 28: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 28

Field Affinity

key

next datadatakey,next

712440

2849975

70486030278

7580

4267275

37858

o4.key ...o4.next o5.key o5.nexto3.next... o6.key

Page 29: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 29

Affinity Relation Abstraction

. s->f3; foo(); . .

. s->f1; bar1(); s->f2; .

. s->f3; while(..) bar2(); s->f1; .

. . s->f1; s->f2; . .

main foo bar1 bar2

bar2.s x {f1}bar2.e x {f2}bar2.r x[(f1,f2) x {(0,1)}]

bar1.s x {f3}bar1.e x {f1}bar1.r x[(f1,f3) x {(0,1)} (f1,f2) x {(1,1), (0,1)}]

foo.s x {f1}foo.e x {f2}foo.r x[(f1,f3) x {(0,2)} (f1,f2) x {(1,1), (0,2)}]

main.s x {f3}main.e x {f2}main.r x[(f1,f3) x {(0,3)} (f1,f2) x {(1,1), (0,2)}]

where F is the set of fields where VAR is the set of function names

Page 30: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 30

Offset Calculation (1/2)

Page 31: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 31

Offset Calculation (2/2)

Page 32: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 32

Traditional vs. WTO based Program SLOC time peak total time peak total

175.vpr 300.twolf

11301 17821

N.A. N.A.

N.A. N.A.

N.A. N.A.

0.154 0.360

15.97 27.03

178.68 313.14

analyzer 763 0.022 1.47 16.59 0.007 1.23 11.97

chomp misr

378 181

0.003 0.003

0.74 0.49

5.01 2.87

0.003 0.003

0.74 0.49

4.96 2.69

bisort health mst

perimeter treeadd tsp voronoi

597 474 408

345 154 433 975

0.002 0.004

0.003

0.003

0.003

0.004

0.005

0.74 0.74

0.74

0.74

0.49

0.74

1.72

4.79 5.90

5.92

4.52

1.52

5.31

14.64

0.002 0.002

0.002

0.002

0.000

0.002

0.003

0.74 0.74

0.74

0.74

0.49

0.74

1.72

4.66 5.47

5.51

4.19

1.51

4.94

14.28

anagram bc ft

ks

355 4303 926

551

0.002 572.897

0.006

0.008

0.74 612.93

0.98

0.98

5.33 4379.97

9.07

9.37

0.002 0.059 0.004

0.004

0.74 9.34 0.98

0.98

5.32 114.09

8.67

7.92

Page 33: Static Transformation for Heap Layout Using Memory … Code Transform Total SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench

2006-12-12 CS @ KAIST 33

Instruction Reference

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

175.vpr

300.twolf

analyzer

chomp

misr

bisort

health

mst

perimeter

treeadd

tsp

voronoi

anagram

bc ft ks

Nor

mal

ized

inst

ruct

ion

refe

renc

e

Pool

Pool + Re

Pool + Re 0.97

0.94 Pool