numeric and performance issues of java roldan pozo leader, mathematical software group national...
TRANSCRIPT
Numeric and Performance Issues of Java
Roldan PozoLeader, Mathematical Software Group
National Institute of Standards and Technology
Numeric and Performance Issues of Java
for Grid Computing!
Roldan PozoLeader, Mathematical Software Group
National Institute of Standards and Technology
Why Java?
• Portability of the Java Virtual Machine (JVM)• Safe, minimize memory leaks and pointer errors• Security model• Network-aware environment• Parallel and Distributed computing: threads, RMI• standard interfaces for graphics, GUIs, databases, etc.• Widely adopted
Why not Java?
• lack of scientific software
– computational libraries
– numerical issues/interfaces
– major effort to port from f77/C
Issues
• performance
• library/application support
• numeric concerns
• language features/limitations
Java Benchmarking Efforts
• Caffine Mark• SPECjvm98• Java Linpack• Java Grande Forum
Benchmarks• SciMark• Image/J benchmark
• BenchBeans• VolanoMark• Plasma benchmark• RMI benchmark• JMark• JavaWorld benchmark• ...
SciMark Benchmark• Numerical benchmark for Java, C/C++
• composite results for five kernels:– FFT (complex, 1D)
– Successive Over-relaxation
– Monte Carlo integration
– Sparse matrix multiply
– dense LU factorization
• results in Mflops• two sizes: small, large
SciMark results
• over 200 platforms (OS/CPU/JVM)
• deployed from one mouse click– numerical libraries– instrumentation/analysis– optional database collection
• impossible to do in Fortran/C/C++
SciMark 2.0 results
0
50
100
150
200
250
300
350AMD 1.5 GHz, IBM JDK1.3, OS/2
Intel PIII (600 MHz), IBM1.3, Linux
AMD Athlon (750 MHz),IBM 1.1.8, OS/2
Intel Celeron (464 MHz),MS 1.1.4, Win98
Sun UltraSparc 60, Sun1.1.3, Sol 2.x
SGI MIPS (195 MHz) Sun1.2, Unix
Alpha EV6 (525 MHz), NE1.1.5, Unix
JVMs have improved over time
0
5
10
15
20
25
30
35
1.1.6 1.1.8 1.2.1 1.3
SciMark : 333 MHz Sun Ultra 10
SciMark: Java vs. C(Sun UltraSPARC 60)
0
10
20
30
40
50
60
70
80
90
FFT SOR MC Sparse LU
CJava
* Sun JDK 1.3 (HotSpot) , javac -0; Sun cc -0; SunOS 5.7
SciMark: Java vs. C(Intel PIII 500MHz, Linux)
0
20
40
60
80
100
120
140
160
FFT SOR MC Sparse LU
CJava
* RH Linux 6.2, gcc (v. 2.91.66) -06, IBM JDK 1.3, javac -O
Current JVMs aren’t so bad...
• Scimark high score: 323 Mflops*
– FFT: 271 Mflops– Jacobi: 353 Mflops– Monte Carlo: 60 Mflops– Sparse matmult: 308– LU factorization: 621 Mflops
* 1.5 GHz AMD Athlon, IBM 1.3.1, OS/2
Making Java fast(er)
• Native methods (JNI)
• stand-alone compliers (.java -> .exe)
• modified JVMs – (fused mult-adds, bypass array bounds checking)
• aggressive bytecode optimization– JITs, HotSpot, etc.
• bytecode transformers
• concurrency: threads, RMI
Scientific Java Libraries
• Matrix library (JAMA)– NIST/Mathworks
– LU, QR, SVD, eigenvalue solvers
• Java Numerical Toolkit (JNT)– special functions
– quadrature
– random numbers
– BLAS subset
• IBM– Array class package
• Univ. of Maryland– Linear Algebra library
• JLAPACK– port of LAPACK
• COLT
• Visual Numerics– LINPACK
– Complex
Java Numerics Group
• industry-wide consortium to establish tools, APIs, and libraries– IBM, Intel, Compaq/Digital/HP, Sun, MathWorks, VNI, NAG
– NIST, Inria
– Berkeley, UCSB, Austin, MIT, Indiana
• component of Java Grande Forum– Concurrency group
Scientific Java Resources
• Java Numerics Group– http://math.nist.gov/javanumerics
• Java Grande Forum– http://www.javagrade.org
• SciMark Benchmark– http://math.nist.gov/scimark
Java Numerics Issues
• Virtual Machine– reproducability: IEEE floating point model– multidimensonal arrays– complex data types– lightweight objects
• Language– operator overloading– generic typing (templates)
Floating point model
• binary reproducability hard for numerics
• floating point model based on Sun architecture...– IEEE 754 standard, no extended formats– on other platforms (Intel, IBM, Alpha) must
round down: up to 10x performance hit– solved by relaxing JVM model: extended
exponents OK, use strictfp otherwise.
Floating Point Model (cont’d)
• Elementary functions– loose definition based on Sun C lib– e.g. on x86: hardware sin(x) = x for x>2^64– solved by allowing results to vary in 1 ulp– use StrictMath library otherwise
• Fused Multiply-Adds (FMAs)– available on many platforms, 50% hit– discussed, but not yet resolved
Java Arrays
• 1-d array size limited to 2^32 elements
• multi-arrays ala C: A[i][j]
• need Fortran-like arrays for best optimization– no aliasing of subsections– map to 1-d vector
• proposed ‘row-major’ contiguous array class– use A.get(i,j): ugly & slow?
Java vs. Fortran Performance
MATMULT
BSOM
SHALLOWFortran
Java
0
50
100
150
200
250
Mfl
ops
*IBM RS/6000 67MHz POWER2 (266 Mflops peak) AIX Fortran, HPJC
Complex numbers / Lightweight Objects
• class has wrong semantics (x = = y)
• can’t return/pass by value
• temporaries can clog garbage collection
• solutions:– add complex native type to JVM (?)– allow lightweight objects (C structs)
• general solution for intervals, multiprecision, etc.
• requires new category of variables in JVM
– use preprocessor to unroll complex operations
Java Language Issues
• operator overloading– numeric community makes best case– unlikely to be introduced into the language– can be solved by external preprocessors
• templates (generics)– proposal for Generic Java (GJ) passed through
standardization process
• Is Java a language issue at all...?
Conclusions
• Java’s biggest strength for scientific computing: binary portability, dynamic distribution
• Java numerics performance can be competitive– can achieve efficiency of optimized C/Fortran
– will it, depends on economics -- not technology
• best Java performance on commodity platforms• improving Java numerics:
– integrate true array and complex into Java standard
– more libraries and numerical software support