cache performance in java virtual machines: a … performance in java virtual machines: a study of...

24
Cache Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for Computer Architecture The University of Texas at Austin

Upload: phamliem

Post on 13-May-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

Cache Performance in Java Virtual Machines: A Study of Constituent Phases

Anand S. Rajan Shiwen Hu and Juan RubioARM Inc. Laboratory for Computer Architecture

The University of Texas at Austin

Page 2: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 2

Cache Performance in JVMs: A Study of Constituent Phases

Motivation

� The execution of a Java program consists of distinct JVM phases� Class loading � Garbage collection � Execution

� Efficient execution of Java programs necessitates a comparative study of requirements and characteristics of JVM phases

Page 3: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 3

Cache Performance in JVMs: A Study of Constituent Phases

Outline

� Experimental methodology

� Instruction cache performance

� Data cache performance

� Impact on cache performance� Varying cache sizes� Varying application data sets

� Conclusion

Page 4: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 4

Cache Performance in JVMs: A Study of Constituent Phases

Methodology� L1 cache behavior of three JVM phases

� Class loading, garbage collection, and execution� Two execution modes: interpreted and JIT

� Experimental workloads: SPECjvm 98 benchmarks� Both s1 and s100 data sets are used

� LaTTe JVM:� An open-source, state-of-the-art JVM� Highly optimized JIT compiler� Fast mark-and-sweep garbage collector

Page 5: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 5

Cache Performance in JVMs: A Study of Constituent Phases

Methodology (Cont.)

256KB, 32 byte blocks,4-way set associative,

write through with write-no-allocate

256KB, 32 byte blocks,2-way set associative3

64KB, 32 byte blocks,4-way set associative,

write through with write-no-allocate

64KB, 32 byte blocks,2-way set associative2

16KB, 32 byte blocks, 4-way set associative,

write through with write-no-allocate

16KB, 32 byte blocks,2-way set associative1

L1 data cacheL1 instruction cacheConfiguration

� Cache simulator:� Based on Cachesim5 from Sun’s Shade V6 tool suite� A JVM phase aware cache simulator

Page 6: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 6

Cache Performance in JVMs: A Study of Constituent Phases

Observations for Both Caches

� Class loading is negligible � In terms of cache misses� Holds for both execution modes

� Garbage collection is relatively more active in the JIT mode than the interpreted mode� Larger working set� Reduced total instruction counts and memory

references

Page 7: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 7

Cache Performance in JVMs: A Study of Constituent Phases

Instruction Cache Performance

� Better instruction cache locality in garbage collection than execution phase

� Better instruction miss rate under JIT mode than under interpreted mode� Execution phase: high method reuse� Garbage collection: low miss rate

Page 8: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 8

Cache Performance in JVMs: A Study of Constituent Phases

Data Cache Performance

� Higher overall data cache miss rate under JIT mode� Better data locality under interpreted mode

� Data accesses reduce drastically under JIT mode� Bytecodes are read only once for compilation� Stack accesses are optimized into register-register

operations

Page 9: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 9

Cache Performance in JVMs: A Study of Constituent Phases

Data Cache Reads� Higher read miss rate under JIT mode

� Interpreted mode: 0.94% (mpegaudio) to 5.1% (jess)� JIT mode: 5.75% (mpegaudio) to 19% (db)

� High read miss rate during garbage collection under both execution modes� Up to 19.4% (interpreted) and 18.8% (JIT)� Due to large working set and pointer chasings

� Read misses dominated by execution phase under both execution modes� Execution phase contributes > 90% of read misses

Page 10: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 10

Cache Performance in JVMs: A Study of Constituent Phases

Data Cache Writes

� High write miss rates in garbage collection� For both execution modes� 50% to 74%

� Garbage collection contributes 44%-78% of write misses� For both execution modes� Exceptions: compress and mpegaudio

Page 11: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 11

Cache Performance in JVMs: A Study of Constituent Phases

Data Cache Performance with Increased Sizes (Execution Phase)

� Write misses are harder to be removed by larger caches� Most write misses are compulsory misses� Holds for both phases

Data cache read misses (%)

0

4

8

12

16

20

compress jess db mpegaudio mtrt jack

16K 64K 256K Data cache write misses (%)

0

10

20

30

40

50

60

70

compress jess db mpegaudio mtrt jack

16K 64K 256K

Page 12: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 12

Cache Performance in JVMs: A Study of Constituent Phases

Data Cache Performance with Increased Sizes (Garbage Collection Phase)

� Larger caches more effective for the execution phase than the garbage collector� Working set of the garbage collector is much larger than 256KB

� Diminishing reduction on data cache missesData cache read misses (%)

0

4

8

12

16

20

compress jess db mpegaudio mtrt jack

16K 64K 256K Data cache write misses (%)

0

10

20

30

40

50

60

70

compress jess db mpegaudio mtrt jack

16K 64K 256K

Page 13: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 13

Cache Performance in JVMs: A Study of Constituent Phases

Impact of Larger Data Sets

� JVM phases perform differently as data set increases� Little change: class loading, JIT compilation� Big change: interpretation, garbage collection

� Garbage collection under both execution modes� Performance deteriorates for both data cache reads

and writes� Performances of instruction cache accesses varies

little

Page 14: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 14

Cache Performance in JVMs: A Study of Constituent Phases

Impact of Larger Data Sets (Cont.)

� Execution phase under interpreted mode:� Cache performance varies little

� Execution phase under JIT mode:� Performance improves for both instruction cache

accesses and data cache writes � Performance of data cache reads deteriorates

Page 15: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 15

Cache Performance in JVMs: A Study of Constituent Phases

Conclusion

� L1 cache performance of Java programs under� Three phases� Two execution engines� Three cache configurations� Two application data sets

� L1 instruction cache performance determined by the execution phase

� Garbage collection is more significant in the JIT mode than in the interpreted mode

Page 16: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 16

Cache Performance in JVMs: A Study of Constituent Phases

Conclusion (Cont.)� Higher data cache miss rates of the garbage collector

than the execution engine

� Higher data cache miss rates in the JIT mode than in the interpreted mode

� A larger cache is more effective on eliminating� Read misses than write misses� misses from execution phase than from garbage collector

� Impact of changing data set varies depending on the JVM phase and cache access type

Page 17: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

THANK YOU

Page 18: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

Appendix

Page 19: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 19

Cache Performance in JVMs: A Study of Constituent Phases

Instruction Cache Performance

1.144.000.2795.981.31

0.722.860.8997.140.72Jack (int)(jit)

0.7545.520.5154.451.21

0.4631.400.4268.510.47Mtrt (int)(jit)

0.180.480.3199.510.18

0.600.0030.4599.990.60Mpegaudio (int)(jit)

0.112.210.0397.700.12

0.160.620.1099.350.16Db (int)(jit)

1.2614.280.6885.681.48

1.333.511.0396.481.35Jess (int)(jit)

0.071.030.1698.850.07

1.300.0020.1599.461.30Compress (int)(jit)

% Abs. miss%Total miss% Abs. miss%Total miss% Abs. miss

OverallGarbage Collection PhaseExecution PhaseBenchmark

Page 20: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 20

Cache Performance in JVMs: A Study of Constituent Phases

Decomposition of Data Cache Misses

18.8739.1460.820.007

3.0915.7284.260.004Jack (int)(jit)

21.4771.8128.160.005

4.0968.8631.140.003Mtrt (int)(jit)

11.060.3499.630.005

1.080.0599.940.004Mpegaudio (int)(jit)

19.5213.6586.330.004

4.205.5094.480.002Db (int)(jit)

24.0741.2058.770.004

6.1118.9481.040.003Jess (int)(jit)

3.600.9898.990.003

2.980.1599.840.001Compress(int)(jit)

Overall D-Cache Miss %Garbage Collection %Execution %Class Loading %Benchmark

Page 21: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 21

Cache Performance in JVMs: A Study of Constituent Phases

Performance of Data Cache Reads

9.299.917.8690.039.48

2.722.098.2497.892.68Jack (int)(jit)

15.8466.7917.0433.1813.83

4.2366.444.3133.554.09Mtrt (int)(jit)

5.750.216.6699.755.75

0.940.013.8899.980.94Mpegaudio (int)(jit)

18.824.0418.7695.9418.83

3.720.0219.3998.063.67Db (int)(jit)

11.1917.6812.7482.3010.92

5.084.7314.7495.264.92Jess (int)(jit)

8.790.3612.9999.628.78

2.030.0516.1499.942.03Compress (int)(jit)

% Abs. miss%Total miss% Abs. miss%Total miss% Abs. miss

OverallGarbage Collection PhaseExecution PhaseBenchmark

Page 22: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 22

Cache Performance in JVMs: A Study of Constituent Phases

Performance of Data Cache Writes

48.3556.4258.9943.5639.21

4.1244.9657.7054.902.34Jack (int)(jit)

38.8278.1238.6921.8539.31

3.6677.474.2722.512.44Mtrt (int)(jit)

31.950.4450.0499.5431.91

1.510.1449.8999.851.51Mpegaudio (int)(jit)

59.6159.6159.5940.3412.54

5.9213.6360.5986.365.18Db (int)(jit)

66.4854.2663.8845.7269.86

9.5344.1765.3255.815.68Jess (int)(jit)

19.922.2067.3797.7719.61

6.190.2674.5699.736.18Compress (int)(jit)

% Abs. miss%Total miss% Abs. miss%Total miss% Abs. miss

OverallGarbage Collection PhaseExecution PhaseBenchmark

Page 23: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 23

Cache Performance in JVMs: A Study of Constituent Phases

Cache Performance with Different Data Sets (JIT Execution)

39.2154.599.4810.011.311.78

2.342.412.682.640.720.7Jack (int)(jit)

39.3155.4113.839.041.211.78

2.443.774.093.180.420.59Mtrt (int)(jit)

31.9147.485.759.010.181.7

1.511.660.942.170.61.02Mpegaudio (int)(jit)

12.5454.7718.838.870.121.79

5.185.173.672.050.160.45Db (int)(jit)

69.8661.210.929.461.481.95

5.688.184.927.751.351.43Jess (int)(jit)

19.6125.898.788.030.070.58

6.180.752.030.241.30.01Compress (int) (jit)

s100s1s100s1s100s1

D-cache WriteD-cache ReadI-cacheMiss Rates (%)

Page 24: Cache Performance in Java Virtual Machines: A … Performance in Java Virtual Machines: A Study of Constituent Phases Anand S. Rajan Shiwen Hu and Juan Rubio ARM Inc. Laboratory for

WWC'02 24

Cache Performance in JVMs: A Study of Constituent Phases

Cache Performance with Different Data Sets (Garbage Collection)

58.9957.217.866.180.270.29

57.760.358.246.370.890.69Jack (int)(jit)

38.6962.6217.0410.520.510.17

4.2765.684.3113.290.470.29Mtrt (int)(jit)

50.0450.076.666.640.310.31

49.8949.893.884.170.450.51Mpegaudio (int)(jit)

59.5947.9918.765.490.030.28

60.5947.1519.390.960.10.05Db (int)(jit)

63.8850.9912.744.920.680.38

65.3259.814.744.041.030.73Jess (int)(jit)

67.3751.4912.996.730.160.25

74.5652.1716.143.30.150.04Compress (int) (jit)

s100s1s100s1s100s1

D-cache WriteD-cache ReadI-cacheMiss Rates (%)