efficient global object space support for distributed jvm...

35
1 Efficient Global Object Space Support for Distributed JVM on Cluster Weijian Fang, Cho-Li Wang and Francis C. M. Lau The Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong

Upload: others

Post on 05-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

1

Efficient Global Object Space Support for Distributed JVM on Cluster

Weijian Fang, Cho-Li Wang and Francis C. M. LauThe Systems Research Group

Department of Computer Science and Information SystemsThe University of Hong Kong

Page 2: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

2

Outline• Introduction

– Distributed Java Virtual Machine– Global Object Space– Related Works

• Our Approach– Cache Coherence Protocol– Distributed-Shared Object– Optimizations

• Performance Evaluation

• Conclusion and Future Work

Page 3: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

3

Motivation #1 : Java

• Built-in multi-threading– A parallel programming language?

• High performance– “Java has potential to be a better environment for

Grande application development than any previous languages such as Fortran and C++. ”

– Java Grande Forum. http://www.javagrande.org/.

Page 4: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

4

Motivation #2 : Cluster Computing

• Cost effective parallel computing– Open source software– Commodity hardware

• Until June 2002 (www.top500.org)– 80 of top 500 supercomputers are clusters– The 3rd powerful supercomputer in the world is a cluster

• 750 HP AlphaServer ES45 connected by Quadrics interconnection network

Page 5: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

5

Distributed JVM

• Comply with JVM Spec.– Transparent execution of

multi-threaded Java programs

• Present a Single System Image of cluster to Java programs– Automatic distribution of

Java threads among cluster nodes

SSI Global Object Space

Multi-threaded Java program

Java thread

Page 6: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

6

Global Object Space Support

Our goal is to design and

implement an efficient GOS for distributed JVM.

• Transparency– Transparent object access disregarding

thread/object’s physical location• Virtualize a single object heap spanning

on the whole cluster– A distributed shared memory service

• Consistency– Comply with Java Memory Model to

handle the memory consistency issue

• Efficiency– Reduce the network traffic incurred by

distributed computing of Java threads

Page 7: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

7

Java Memory Model

• Define memory consistency semantics in multi-threaded Java programs– GOS must comply with JMM

• There is a lock associated with each object– Protect critical sections– Maintain memory consistency between threads

• JMM is similar to Home-based Lazy Release Consistency

Page 8: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

8

Java Memory Model (contd.)

Load variable from main memory to working memory before use.

Variable is modified in T1’s working memory.

T1 T2

Before T1 performs unlock, dirty variable is written back to main memory

Before T2 performs lock, flush variable in working memory

When T2 uses variable, it will be load from main memory

Garbage Bin

Thread working memory

Main memory

Object

Variable

Page 9: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

9

Related Works

• Method shipping– Usually no replication– Method invocation and object access will be

forwarded to the node where object resides– E.g. cJVM

• Page shipping– Leverage page-based DSM to build GOS at runtime– E.g. JESSICA, Java/DSM

• Object shipping– Leverage object-based DSM to build GOS at runtime– E.g. Hyperion, Jackal

Page 10: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

10

Method Shipping

• E.g. cJVM

• Master/proxy object model– Method invocation and field access on proxy object should

be forwarded to master object.

• Usually forbid object replication to leave out consistency problem

• More aggressive object caching is preferred

• Load distribution is determined by object distribution

Page 11: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

11

Page Shipping

• E.g. JESSICA, Java/DSM

• Leverage some page-based Distributed Shared Memory

• Sharing granularity gap between object-oriented Java and page-based DSM– False sharing problem

• Not easy to do further optimization

Page 12: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

12

Object Shipping

• Leverage some object-based DSM at run time

• Examples:– Hyperion: translate Java bytecode to C– Jackal: compile Java source code directly to native

code

• No JVM involved in execution

Page 13: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

13

Outline• Introduction

– Distributed Java Virtual Machine– Global Object Space– Related Works

• Our Approach– Cache Coherence Protocol– Distributed-Shared Object– Optimizations

• Performance Evaluation

• Conclusion and Future Work

Page 14: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

14

A Straight-forward Object-based Cache Coherence Protocol for JMM

• Home-based– A home node is selected for each object– Updates are propagated to the home on synchronizations– Clean copies are derived from the home– Home node acts as lock manager

• Twin and Diff– Support concurrent multiple writer

Page 15: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

15

Example

Up-to-date copy is fetched from home node upon access.

Diff is created and sent back to home node on unlock.

Before lock, non-home object is invalidated.

Interconnection Network

Node 0 Node 2HOMENode 1

Twin is created on the first write.

Page 16: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

16

DSO - Definition• Object connectivity and thread reachability are

available at run time

• Consider reachability– Thread-local object: reachable from only one thread– Thread-escaping object: reachable from multiple

threads

• Further consider the physical locations of thread and object in distributed JVM– Node-local object (NLO): reachable from thread(s)

at the same node– Distributed-shared object (DSO): reachable from at

least two threads located at different cluster nodes

Page 17: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

17

DSO – Benefits from DSO detection

• Only synchronizations on DSOs should trigger distributed consistency operation

• Only DSOs are involved in distributed consistency operation

• NLOs can be safely collected by a local garbage collector

Page 18: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

18

DSO – A lightweight detection scheme

• Leverage Java’s runtime reachability information

• The detection is postponed upon – The distribution of Java threads to other nodes– Sending objects to a remote node

• Identify object references transmitted to other nodes– Must be DSOs

Page 19: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

19

DSO – Detection (Ex.)

Java threadstack frame T1

b

c d

a

e f g

c d

b

T2

g

dc

b

T2

g

d

Java object

Detected DSO

Invalid DSO

Connectivitybetween objects

Object referencein thread stack

Node 0 Node 1

Cluster network

Page 20: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

20

Optimizations

• Object Home Migration

• Synchronized Method Migration

• Object Pushing

Page 21: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

21

Object Home Migration

• Access asymmetry in home-based protocol– Coherence of home copy is kept through update – Coherence of non-home copy is kept through invalidate– Home accesses are more lightweight than non-home

accesses

• Home migration– Reselect the node where most accesses happen as the

home node for the object– Adapt to object access behavior in applications– Negative impact

• Migration notices

Page 22: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

22

Object Home Migration (contd.)• Optimize object exhibiting single writer access

pattern

• Record remote writes at home node– Remote writes come as diff messages

• Count consecutive writes– Issued by the same remote node– Not interleaved by writes from other nodes

• Migrate home to the writing node – When the number of consecutive writes exceeds a

predefined threshold

Page 23: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

23

Synchronized Method Migration

1 class Counter {

2 private int i; // internal counter

3

4 public Counter() {

5 i = 0;

6 }

7

8 public synchronized void inc() {

9 i++;

10 }

11 }

lock request

lock reply

obj request

obj reply

unlock req

unlock rep

Home Node Executing Node

inc() is invoked on a non-home node

Non-home execution of synchronized method involves multiple message roundtrips

Page 24: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

24

Synchronized Method Migration (contd.)

• Non-home execution of synchronized method is usually inefficient in distributed JVM– Involves multiple message roundtrips

• Migrate synchronized method of DSO to its home node for execution– Only one message roundtrip– Aggregate synchronization and data requests

• Thanks to the detection of DSOs

Page 25: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

25

Object Pushing• Reference locality

– After an object is accessed, its reachable objects are very likely to be accessed afterwards.

– Partially determined by reachability– Prefetching

• Object pushing– Push-based prefetching– The home node pushes the objects reachable from the

requested DSO– Reachability information at home node is always valid

• Guarantee the correctness of prefetching

• Optimal message length– Represent preferred aggregation size of objects

Page 26: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

26

Outline• Introduction

– Distributed Java Virtual Machine– Global Object Space– Related Works

• Our Approach– Cache Coherence Protocol– Distributed-Shared Object– Optimizations

• Performance Evaluation

• Conclusion and future work

Page 27: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

27

Implementation

• Modify Kaffe 1.0.6

• On a cluster of 300MHz PII PCs, running Linux 2.2, connected by Fast Ethernet

• Threads are automatically distributed among cluster nodes

Page 28: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

28

Benchmark Suite

• ASP (All-pair Shortest Path)

• SOR (red-black successive over-relaxation)

• Nbody

• TSP

Page 29: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

29

Efficiency

0

0.2

0.4

0.6

0.8

1

1.2

1 2 4 8

Num be r of proce s s ors

Effi

cien

cy

ASPSORNbodyTSP

Page 30: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

30

Effect of Optimizations –Breakdown of execution time

0%

20%

40%

60%

80%

100%

120%

ASP SOR Nbo d y TSP

Co mp Syn Obj

Page 31: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

31

Effect of Optimizations –Message number

����������

����������

�����������

�����������

�����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������0%

20%

40%

60%

80%

100%

120%

AS P S OR Nbody TS P

Mes

sage

num

ber

NO HM

�����

HM +S M M HM +S M M +P us h

Page 32: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

32

Effect of Optimizations –Communication data volume

����������

����������

����������

����������

����������

�����������

�����������

�����������

�����������

�����������

�����������

�����������

�����������

�����������

�����������

�����������

�����������

�����������

�����������

�����������

�����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������0%

20%

40%

60%

80%

100%

120%

ASP SOR Nbody TSP

NO HM

�������� HM +SM M HM +SM M +Pus h

Page 33: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

33

Conclusion• Global object space for distributed JVM

• Distributed-shared object– More efficient cache coherence protocol and garbage

collection in distributed JVM– Facilitate further optimizations in GOS

• Effective runtime optimizations in GOS– Object home migration

• Single writer access pattern– Synchronized method migration

• Non-home execution of synchronized method of DSOs– Object pushing

• Small size object graph

Page 34: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

34

Future work

• Incorporate DSO with distributed garbage collection

• More adaptive cache coherence protocol that automatically adjusts to various object access patterns in GOS

Page 35: Efficient Global Object Space Support for Distributed JVM ...clwang/papers/icpp2002-GOS-slide-Final.pdf · • More aggressive object caching is preferred • Load distribution is

35

Q & A