gc advantage: improving program locality

35
1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng

Upload: serge

Post on 09-Jan-2016

31 views

Category:

Documents


1 download

DESCRIPTION

GC Advantage: Improving Program Locality. Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng. Motivation. Memory gap How are Java programs affected?. Marksweep vs. Copying. pseudojbb. Motivation. Javac with perfect L1 and L2 cache. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GC Advantage: Improving Program Locality

1

GC Advantage: Improving Program Locality

Xianglong Huang, Zhenlin Wang,Stephen M Blackburn, Kathryn S McKinley,

J Eliot B Moss, Perry Cheng

Page 2: GC Advantage: Improving Program Locality

2

Motivation

Memory gapHow are Java programs affected?

Page 3: GC Advantage: Improving Program Locality

3

Marksweep vs. Copying

pseudojbb

Page 4: GC Advantage: Improving Program Locality

4

Motivation

Javac with perfect L1 and L2 cache.

16K L1 256K L2 Appel, GCTk. Breadth first

0

5

10

15

20

25

_213_javac (10̂ 9 cycles)

originalperfect L2perfect L1

Page 5: GC Advantage: Improving Program Locality

5

Motivation

Copying collector can reorder objectsGoal: take advantage of copying collectors

reorder objects to improve locality

Page 6: GC Advantage: Improving Program Locality

6

Exploring The Space

Different policies for traversing rootsClass-oblivious traversal orders

Which traversing order is the best?

Class-based traversal orders How to find the “important” data structure?

Page 7: GC Advantage: Improving Program Locality

7

Different Root Traversal Policies

Two different types of roots: Stack, global variables Remember sets (for generational)

Different traversal orders Copy all roots before traversing any children Copy each root and its children (root-by-root) Split roots

Stack first and the children Remset first and the children

Page 8: GC Advantage: Improving Program Locality

8

Experiment Setup

JikesRVM, JMTkGenerational copying collector with

bounded nursery size of 4MBPseudoAdaptive 2nd iteration

Page 9: GC Advantage: Improving Program Locality

9

Different Root Traversal Policies

•RxR has the best mutator locality

Page 10: GC Advantage: Improving Program Locality

10

Different Root Traversal Policies

•Total execution time

Page 11: GC Advantage: Improving Program Locality

11

Exploring The Space

Different policies for traversing rootsClass-oblivious traversal orders

Which traversing order is the best?

Class-based traversal orders How to find the “important” data structure?

Page 12: GC Advantage: Improving Program Locality

12

Different Traversal Orders

Breadth first 1,2,3,4,5,6,7Pure depth first 1,2,6,3,4,7,5Pure depth first, LIFO 1,5,4,7,3,2,6

1

4

76

2 35

Page 13: GC Advantage: Improving Program Locality

13

Different Traversal Orders

Breadth first 1,2,3,4,5,6,7Pure depth first 1,2,6,3,4,7,5Pure depth first, LIFO 1,5,4,7,3,2,6Partial depth first, 2 children 1,2,6,3,4,5,7

1

4

76

2 35

Page 14: GC Advantage: Improving Program Locality

14

Class Oblivious Type

Different traversal policies Partial DF is the best

Page 15: GC Advantage: Improving Program Locality

15

Exploring The Space

Different policies for traversing rootsClass-oblivious traversal orders

Which traversing order is the best?

Class-based traversal orders How to find the “important” data structure?

Page 16: GC Advantage: Improving Program Locality

16

Class-based Traversal

Class-oblivious traversal orders inflexibleClass-based object traversal

Static profiling Dynamic sampling

Page 17: GC Advantage: Improving Program Locality

17

Static Profiling

Profile object accesses Find hot pairs with strong correlation Example

(1,4), (4,7) and (2,6) have strong correlation Order: 1,4,7,2,6,3,5

1

4

76

2 35

Page 18: GC Advantage: Improving Program Locality

18

Online Profiling

Use the adaptive compiler sampling Hot method Hot basic block

Use field accesses to indicate hot fields Example: (In a hot method)

{Class A a;a.b=…;

… }

A

B

b…..

Page 19: GC Advantage: Improving Program Locality

19

Online Profiling

Micro benchmark results

Page 20: GC Advantage: Improving Program Locality

20

Online Profiling

Geometric mean

Page 21: GC Advantage: Improving Program Locality

21

Reasons

No advice for most of the objects copied For jess, db and raytrace, we only pick <<1% of

the objects as hot objects 5% for javac

The hot fields are within the first 2 pointers 90% of the advised objects for javac

Page 22: GC Advantage: Improving Program Locality

22

Online Profiling

PseudoJBB mutator results Generate advice for 23% of the copied objects 75% of the objects have adviced hot fields

other than first 2

Page 23: GC Advantage: Improving Program Locality

23

Questions

Have we found all the hot objects? Not all hot objects are connected?

Is class-base good enough? For pseudojbb, we need instance-based?

Locality for the nursery objects?

Page 24: GC Advantage: Improving Program Locality

24

Future Work

Sampling technique Catch more hot objects access

Lower the threshold Hot objects that are not connected

Dynamically change the advice for phase changing

Nursery localityDifferent traversal orders for cold objectsInstance-based

Page 25: GC Advantage: Improving Program Locality

25

Conclusion

Reorder objects during copying collection can improve locality

In class-oblivious traversal orders partial depth first order is the best

Online profiling, class-based traversal is more flexible, up to 50% better. very low overhead, ~0%

Still mysteries

Page 26: GC Advantage: Improving Program Locality

26

Questions?

Page 27: GC Advantage: Improving Program Locality

27

Answers?

Lower the threshold of the sampling, not only the hot methods

For objects with only 1 or 2 pointers, it maybe easier just depth first

Maybe the nursery locality is more important

Instance-based advice

Page 28: GC Advantage: Improving Program Locality

28

Online Profiling

Execution overhead

-6.00%-5.00%-4.00%-3.00%-2.00%-1.00%0.00%1.00%2.00%3.00%4.00%5.00%

overhead

Page 29: GC Advantage: Improving Program Locality

29

Online Profiling

Micro benchmark results for mutator time

Page 30: GC Advantage: Improving Program Locality

30

Different Root Traversal Policies

_227_mtrt

Page 31: GC Advantage: Improving Program Locality

31

Static Profiling

Results

Page 32: GC Advantage: Improving Program Locality

32

Answers?

Most objects have only one pointerPercentage of objects copied by advice

(whether it is really hot?) For pseudojbb ~50%, for jess <<1%, for our

micro benchmark ~16%Change! Half of the pairs do not form

chains longer than 2Maybe the nursery locality is more

important

Page 33: GC Advantage: Improving Program Locality

33

Class Oblivious Orderings

Different traversal policies Partial DF is better

pseudoJBB

Page 34: GC Advantage: Improving Program Locality

34

Motivation

MarkSweep vs. Copying Collector

Mutator time of_213_javac

Page 35: GC Advantage: Improving Program Locality

35

Motivation

Mutator L2 misses_213_javac