highly parallel, object-oriented computer architecture (also the jikes rvm and pearcolator) vienna...
TRANSCRIPT
![Page 1: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/1.jpg)
Highly Parallel, Object-Oriented Computer Architecture
(also the Jikes RVM and PearColator)Vienna University of Technology
August 2nd
Dr. Ian Rogers,Research Fellow,
The University of [email protected]
![Page 2: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/2.jpg)
Presentation Outline The JAMAICA project
What are the problems?
Hardware design solutions
Software solutions
Where we are
The Jikes RVM – behind the scenes
PearColator – a quick overview
![Page 3: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/3.jpg)
Problems “The job of an engineer is to identify problems
and find better ways to solve them”, James Dyson (I'm sure many others)
There are many problems currently in Computer Science and more on the horizon
Problem solving is adhoc, and many good solutions aren't successful
Let's look at the problems from the bottom up
![Page 4: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/4.jpg)
Problems in Process Heat and power are problems
Smaller sizes (<45nm) lead to problems with process variations, degradation over time, more transient errors.
3D or stacked designs have significant problems
Simulation must be repeated for all possible design and environment variations so statistically a design should work
![Page 5: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/5.jpg)
Problems in Architecture Speed of interconnect
Die area is very large
Knowing your market “tricks” are key to realising performance – especially in the embedded space
Move away from general purpose design – GPUs, physics processors
![Page 6: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/6.jpg)
Problems in Systems Software Lag of systems software behind new hardware
How to virtualise systems with minimal cost
Lots of momentum in existing solutions
Problems with natively executable code: needs to be run in a hardware/software sandbox
no dynamic optimisation of code with libraries and operating system
cost of hardware to support virtualization
![Page 7: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/7.jpg)
Problems in Compilers The hardware target keeps moving
The notion of stacked languages, and virtual machines isn't popular
Why aren't we better at instruction selection? embedded designs have vasts amount of
assembler code targeting exotic registers and ISAs
How to parallelize for thousands of contexts
Machine learning?
![Page 8: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/8.jpg)
Problems for Applications Application writers have an abundance of tools
and wisdom to listen to, the wisdom often conflicts
Application concerns: performance
maintainability (evolution?)
time to implement
elegance of solution
![Page 9: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/9.jpg)
Problems for Consumers Cost
Migration of systems
Legacy support
![Page 10: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/10.jpg)
Recap Process: lots of transistors, lots of problems
Architecture: speed of interconnect, complexity
Systems software: momentum
Compilers: stacking, using new architectures
Applications: lots of tools and wisdom, concerns
Consumers: how much does it cost? What about my current business systems and processes?
![Page 11: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/11.jpg)
Oh dear, it's a horrible problem, what we need is a new programming language
Why? parallel programming is done badly at the moment
we should teach CS undergraduates this language
we will then inherently get parallel software
Why not? CSP has already been here [Hoare 1978]
clusters are already solving similar problems using existing languages
it's easy to blame the world's problems on C
![Page 12: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/12.jpg)
Do we need another programming language?
What I think is needed are domain specific languages, or domain specific uses of a language: extracting parallelism from Java implies work not
necessary at a mathematical abstraction – MatlabP, Mathematica, Fortran 90
codecs, graphics pipelines, network processors – languages devised here should express just what's necessary to do the job
message passing to avoid use of shared memory
![Page 13: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/13.jpg)
Virtual Machine Abstraction We don't need another programming language,
we need common abstractions
This abstraction will be inherently parallel but not inherently shared memory
Java is a reasonable choice and brings momentum
![Page 14: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/14.jpg)
Architecture Point-to-point asynchronous interconnect
Simultaneous Multi-Threading to hide latencies (e.g. Sun Niagara, Intel HyperThreading)
Object-Oriented – improve GC, simplify directories
Transactional – remove locks, enable speculation
![Page 15: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/15.jpg)
Object-Oriented Hardware Proposed in the Mushroom project from
Manchester
Recent papers by Wright and Wolzcko
Address the cache using object ID and offset
Object ID Offset
L1 Data Cache
![Page 16: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/16.jpg)
Object-Oriented Hardware
On a cache miss the object ID is translated to the object’s address
Object ID Offset
L1 Data Cache
MISS!
![Page 17: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/17.jpg)
Object-Oriented Hardware
We can re-use the TLB
Having a map allows objects to be moved without altering references
Only object headers will contain locks
Object ID Object to Address Map
Virtual Memory
TLB
![Page 18: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/18.jpg)
Transactional Hardware
Threads reading the same object can do so regardless of their epoch
(based on [Ananian et al., 2005])
Transaction Object ID
![Page 19: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/19.jpg)
Transactional Hardware
When a later epoch thread writes to an object a clone is made
Transaction Object ID
![Page 20: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/20.jpg)
Transactional Hardware
If an earlier thread writes to an object the later threads using that object rollback
Transaction Object ID
![Page 21: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/21.jpg)
Object-Oriented and Transactional Hardware
Again the TLB can remove some of the cost
Map Virtual Memory
TLB
Transaction Object ID
![Page 22: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/22.jpg)
Speculation Speculative threads created with predicted
input values and expected not to interact with other non-speculative threads
Transaction can complete if we didn’t rollback and inputs to thread were as predicted
Can speculate at: Method calls
Loops
![Page 23: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/23.jpg)
Operating Systems Supporting an object based and virtual
memory view of a system implies extra controls in our system
Therefore, we want the whole system software stack inside the VM
![Page 24: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/24.jpg)
Operating Systems
![Page 25: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/25.jpg)
Where we are Jikes RVM and JNode based Java operating
systems
Open source dynamic binary translator (arguably state-of-the-art performance)
Simulated architecture
Parallelizing JVM, working for do-all loops, new work on speculation and loop pipelining
Lots of work on the other things I've talked about
![Page 26: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/26.jpg)
The Jikes RVM Overview of the adaptive compilation system:
Methods recompiled based on their predicted future execution time and the time taken to compile
Some optimisation levels are skipped
![Page 27: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/27.jpg)
The baseline compiler Used to compile code the first time it’s invoked
Very simple code generation:
iload_0
iload_1
iadd
istore_0
Load t0, [locals + 0]
Store [stack+0], t0
Load t0, [locals + 4]
Store [stack+4], t0
Load t0, [stack+0]
Load t1, [stack+4]
Add t0, t0, t1
Store [stack+0], t0
Load t0, [stack+0]
Store [locals + 0], t0
![Page 28: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/28.jpg)
The baseline compiler Pros:
Easy to port – just write emit code for each bytecode
Minimal work needed to port runtime and garbage collector
Cons: Very slow
![Page 29: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/29.jpg)
The boot image Hijack the view of memory (mapping of objects to
addresses)
Compile list of primordial classes
Write view of memory to disk (the boot image)
The boot image runner loads the disk image and branches into the code block for VM.boot
![Page 30: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/30.jpg)
The boot image Problems:
Difference of views between: Jikes RVM Classpath Bootstrap JVM
Fix by writing null to some fields
Jikes RVM runtime needs to keep pace with Classpath
![Page 31: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/31.jpg)
The runtime M-of-N threading
Thread yields are GC points
Native code can deadlock the VM
JNI written in Java with knowledge of C layout
Classpath interface written in Java
![Page 32: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/32.jpg)
The Jikes RVM Overview of the adaptive compilation system:
Methods recompiled based on their predicted future execution time and the time taken to compile
Some optimisation levels are skipped
![Page 33: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/33.jpg)
The optimizing compiler Structured from compiler phases based on HIR, LIR
and MIR phases from Muchnick
IR object holds instructions in linked lists in a control flow graph
Instructions are an object with:
One operator
Variable number of use operands
Variable number of def operands
Support for def/use operands
Some operands and operators are virtual
![Page 34: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/34.jpg)
The optimizing compiler HIR:
Infinite registers
Operators correspond to bytecodes
SSA phase performed
LIR:
Load/store operators
Java specific operators expanded
GC barrier operators
SSA phase performed
MIR:
Fixed number of registers
Machine operators
![Page 35: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/35.jpg)
The optimizing compiler Factored control graph:
Don’t terminate blocks on Potentially Exceptioning Instructions (PEIs)
Bound check Null check
Checks define guards which are used by: Putfield, getfield, array load/store, invokevirtual
Eliminating guards requires propagation of use
![Page 36: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/36.jpg)
The optimizing compiler Java – can we capture and benefit from strong type
information?
Extended Array SSA:
Single assignment
Array – Fortran style - a float and an int array can’t alias
Extended – different fields and different objects can’t alias
Phi operator – for registers, heaps and exceptions
Pi operator – define points where knowledge of a variable is exposed. E.g. A = new int[100], later uses of A can know the array length is 100 (ABCD)
![Page 37: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/37.jpg)
The optimizing compiler HIR: Simplification, tail recursion elimination, estimate
execution frequencies, loop unrolling, branch optimizations, (simple) escape analysis, local copy and constant propagation, local common sub-expression elimination
SSA in HIR: load/store elimination, redundant branch elimination, global constant propagation, loop versioning
AOS framework
![Page 38: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/38.jpg)
The optimizing compiler LIR: Simplification, estimate execution frequencies,
basic block reordering, branch optimizations, (simple) escape analysis, local copy and constant propagation, local common sub-expression elimination
SSA in LIR: global code placement, live range splitting
AOS framework
![Page 39: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/39.jpg)
The optimizing compiler MIR: instruction selection, register allocation,
scheduling, simplification, branch optimizations
Fix-ups for runtime
![Page 40: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/40.jpg)
Speculative Optimisations Often in a JVM there’s potentially not a
complete picture, in particular for dynamic class loading
On-stack replacement allows optimisation to proceed with a get out clause
On-stack replacement is a virtual Jikes RVM instruction
![Page 41: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/41.jpg)
Applications of on-stack replacement
Safe invalidation for speculative optimisation
Class hierarchy-based inlining
Deferred compilation Don’t compile uncommon cases Improve dataflow optimization and improve compile time
Debug optimised code via dynamic deoptimisaton
At break-point, deoptimize activation to recover program state
Runtime optimization of long-running activities
Promote long-running loops to higher optimisation levels
![Page 42: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/42.jpg)
PearColator
![Page 43: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/43.jpg)
PearColator
Decoder:
Disassembler
Interpreter (Java threaded)
Translator
Generic components:
Loaders
System calls
Memory
![Page 44: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/44.jpg)
PearColator
![Page 45: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,](https://reader035.vdocuments.mx/reader035/viewer/2022062516/56649e4f5503460f94b47237/html5/thumbnails/45.jpg)
Thanks and…
any questions?