thinking in c/c++, coding in java
DESCRIPTION
foss.in 2012 talk (http://fossdotin2012.shdlr.com/conferences/talk/196) Intent: There comes a time in every C/C++ programmer's life where he is looking at a smashed stack, a trashed heap & wishes that core dumps happened only when null pointers get deferenced. This is the weak moment when people hang up their gdb boots & trade it for java.lang.NullPointerException We shall be exploring how to use Java as a safer version of C without giving up too much of control. A lot of big open source projects are starting to show up in Java for this very reason (eg: hadoop) Overview: The Java programming language was considered too slow and too high level in its early days by performance junkies who believed that the only true way out was to code in C (and very reluctantly in C++). The language itself made significant strides by the time it reached v5 and JVMs also have become quite good at what they doTRANSCRIPT
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Thinking in C/C++, coding in Java
foss.in 2012Arvind Jayaprakash
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Audience
• Surely not for you if you’ve never done *nix system programming or bare C/C++
• Maybe for you if you’ve done reasonable amount of the above and “hello world” Java
• Prime audience if you are being pushed into/want to explore Java as an option for moderately high performance applications
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
whoami
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
finger
Home• anomalizer
• anomalizer
• http://anomalizer.net/
Work• anomalizer
• http://inmobi.com/
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
history/uname
Home• MS-DOS in 1990
• Primarily Win98 & a little bit of RH7 in 2001
• Win7 for PPT and Gentoo for everything else in 2012 (fluxbox is my window manager, xterm is my favourite terminal)
Work• 5 years of FreeBSD & 1 year
of RHEL
• Chose the OS for current employer’s servers (Ubuntu since 2008)
• Gentoo/Win7 combo on my laptop
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Primer
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Survival tips
• Java language (J2SE) != J2EE• J2SE 5 (also known as 1.5 or JLS5) is lowest
respectable version of the language• Sun (now Oracle) JRE continues to remain the
most popular free JRE+JDK• Sun-JRE 1.6.0.22 is a good min version if you have
64 bit, x86_64, NUMA hardware running linux• IDEs are necessary evil; vim/emacs just doesn’t
cut it
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Why Java 1.5?
• Extensive concurrency libs• Generics• Annotations• Lint checks• Enums (typesafe too!)• Variable arguments• foreach
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Let us get started now*
* usually means over-simplification that shall be clarified later
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Classes & Objects
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
D’oh
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Primitives v/s objects
• Primitive data types, structs & classes play by the exact same set of rules in C/C++ in almost every context
• Java fundamentally drives a wedge between the two both at a language level and runtime level
• This is why there a primitive int and a class Integer. These 2 are not interchangeable*
* Auto boxing is a deception
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
The approximate analogy
Primitives• Think of primitives of values
that can reside on the stack• Lifespan always tied to
source scope for local variables
Composites (Objects)• Think of objects (classes) as
values that always* reside on heap
• Now it becomes obvious that you are always dealing with pointers/references
• It also becomes obvious that their true lifespan is not tied to source scope
*escape analysis implementations in some JVMs
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Nested structs/classesclass Point { public int x; public int y;}
class Rect { public Point top_left; public Point bottom_right;}
struct Point { int x; int y;}
struct InlineRect { Point top_left; Point bottom_right;}
struct IndirectRect { Point *top_left; Point *bottom_right;}
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Null & void
• The notorious void* exists in Java; it is commonly referred to as the class named Object– Any object (reference) can be directly cast to
Object– An object (reference) of type Object can be
downcast to any type at compile time#
• null is not a type, however it is a language defined literal (like true & false)
# but can throw an error at runtime
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
What are references in java?
Why it is like a C pointer• Think of a reference as C
pointer• Think of the dot operator in
Java as C’s arrow operator• null is NULL, dereferencing
it is a bad idea • Think of a final ref in Java as
a const ptr (not to be confused with ptr to const)
Why it is not like a C++ reference
• j-refs are nullable (d’uh)• C++ refs cannot be made to
point to something else post declaration unlike Java refs
• == operator in J has ptr equivalence semantics, not dereferenced object equivalence; use equals() for that
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
vtables
• Every class inherits from Object class• Every member function is virtual in Java; there is
no opt-out– Hence, internally, every class has a vtable– And every object instance has an internal pointer/ref
to the vtable of its actual type (for dynamic dispatch)– And a fn-call is via ptr-to-fn*
• RTTI (of C++ fame) comes at no additional cost as a side-effect & guaranteed to be available
*Unless you do some class/method finalisation
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Other deceptive similarities
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Generics & templates aren’t the same
Java generics• No support for primitives• Single copy of code exists
regardless of the number of type arguments a generic code is used with
• Generified code get compiled as an entity in itself
• Bounded type parameters, possible, unbounded defaults to Object
C++ templates• Supports all types• One copy of object code for
each template instantiation• Glorified C style marcos,
compilation happens once for each expansion; some compilation errors crop up here
• No inheritance family based bounding of type parameters, only explicit specialization is possible
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
casts
• Syntactically identical to C casts• Let us speak in C++ terms for semantic clarity– static_cast is permitted– No const_cast as there are no consts to begin with– dynamic_cast permitted due to implicit RTTI
support (hence Object objects can be cast to anything)
– reinterpret_cast disallowed; convert & copy is the only way out
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Memory issues
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Auto-boxing woes
• Java 5 made it syntactically possible to use a primitive and it’s objectified version interchange-able (eg: Long & long)
• The costs however are very different– Indirection (ptr de-ref) to read value– Memory footprint is 2 ptrs (one to value, and the
vptr inside object) + that of actually storing the primitive
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
You don’t want to see this
Integer x;
for(int i = 0 ; i < 100; i++) { x = i * i; }
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
int[] v/s ArrayList<Integer>
• vector<int> & int[] have identical performance in C++, don’t carry that assumption into Java!
• Remember, generics only work with objects, so we can’t use an int with it
• And int is just not the same as an Integer
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
In figures
int[]
ArrayList<Integer>
Array header a0 a1 a2 an-1
Array header
ObjectHeader
a2ObjectHeader
a1
ObjectHeader
a0
ObjectHeader
an-1
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
In words
• On an un-tuned 64 bit JVM, pay at-least 400% memory tax (it is still 200% on a tuned JVM)
• 100% apparent memory access cost• Completely wreck your cache lines by simply
iterating through the array (real tax can exceed 100%)
• And yes, there is copying involved when you expand beyond a certain limit
• And more work for GC …
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
The solution
• So what about collections of primitives?– What if you want an expandable array of ints?– What if you want a map of short to double?
• Use primitive collection libraries– trove4j solves the above problems– It is GNU project & comes with LGPL license too
• The larger point however is to understand the object model & memory layout
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
No reinterpret cast for you!
• Imagine trying to read values from byte streams such as files & sockets
• You have 3 choices– Bottom-up read, one primitive at a time (entire class
chain must play nice for this)– Slurp the blob, break the blob and make meaningful
object by copying over the primitives in top-down fashion (a.k.a. memcpy)
– Use java serialization (disallows conditional parsing)
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
I/O ops
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Dealing with slow parts(of any language)
• A common reason to fall back to “native” languages is when a large amount of I/O is involved
• I/O is dreaded as it usually translates to *nix syscalls
• A lot of syscalls exist specifically to optimize userspace/kernel space transition inefficiencies
• They also have OS idosyncracies
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Java & I/O*nix & C feature Java equivalent Available since
Allocate char* ByteBuffer.allocate() 1.4
sendfile() FileChannel.transfer{To|From} 1.4
mmap() FileChannel.map() 1.4
epoll() Channels.Selector() + SelectorProvider
API since 1.4, epoll as implementation since 1.6
readv()/writev() Channel.read/write (ByteBuffer[]) 1.4
chmod()/chown()/inotify()/stat()/copy()/symlink()/readdir/…
NIO2 file api 1.7
SCTP - 1.7
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
etc
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Not covered in the talk
• Reflection– Runtime inspection of types & dynamic code gen
• JIT– JRE profiles applications & recompiles code with
optimizations mid-flight!– Discovers structural shortcuts possible in a given
app & exploits it• JNI– When you have to bridge your C code
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
Go read about the following
• “maven” (awesome build mgmt tool)• “Google guavas” (as important as boost for
cpp, historically speaking)• “Project lombok” (uses annotations to tuck
away massive boilerplate coding)• “slf4j” (log4j is so Java 1.2, never code against
it)• “netty” (the libevent of Java)
Arvi
nd Ja
yapr
akas
hThinking in C/C++, coding in Java
And some more
• “testng” (unit & module testing system)• “mockito” (helps in creating test mocks)• “javassist” (create entire classes from strings
at runtime!)• “guice” & “Spring DI” (dependency injection)