java serialization facts and fallacies

35
Java Serialization Facts and Fallacies © 2013, Roman Elizarov, Devexperts

Upload: roman-elizarov

Post on 22-Jun-2015

2.249 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Java Serialization Facts and Fallacies

Java SerializationFacts and Fallacies

© 2013, Roman Elizarov, Devexperts

Page 2: Java Serialization Facts and Fallacies

Serialization

… is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and resurrected later in the same or another computer environment. -- WIkipedia

Object Bytes Network/Storage

Page 3: Java Serialization Facts and Fallacies

Use-cases

• Transfer data over network between cluster nodes or between servers and clients- Serialization is a key technology behind any RPC/RMI

• Serialization if often used for storage - but then data transfer is still often a part of the picture

Page 4: Java Serialization Facts and Fallacies

Java serialization facility (key facts)

• Had appeared in 1997 as a part of JDK 1.1 release

• Uses java.io.Serializable as a marker interface

• Consists of java.io.ObjectInputStream/ObjectOutputStream with related classes and interfaces

• Has “Java Object Serialization Specification” that documents both API and object serialization stream protocol

• Is part of any Java platform (even Android has it)

• Somehow has a lot of myths around it

Page 5: Java Serialization Facts and Fallacies

It is a joke … and the only JEP joke (!) … and it has a big grain of truth: There are lots of folk myths about serialization

Page 6: Java Serialization Facts and Fallacies

SERIALIZATION

Page 7: Java Serialization Facts and Fallacies

SERIALIZATION

Kryo

Google protobuf

Apache Avro

JBoss serializationProtostuff

Jackson

Coherence POF

JAXB

Gson

Json-lib

FLEXJSON

GridGain serializationfast-serialization

Hessian

Burlap

jserial

quickser

EclipseLink MOXy

java.beans.XMLEncoder/XMLDecoder

Hadoop Writable

svenson

Hazelcast Portable

XStream

msgpack woblyApache Thrift

Javolution XML

jsonij

Page 8: Java Serialization Facts and Fallacies

The obvious fact of live

Programming your own serialization [framework] is fun!

… and gets you the highest-performance and tailored solution to your particular problem.

Real hackers write everything from scratch

Every mature appserver/framework/cache/eventbus/soa/rpc/db/etcmust have its own serialization that is better than everybody else’s

Page 9: Java Serialization Facts and Fallacies

How to choose a serialization lib?

• Cross-language or Java-only

• Binary or text (XML/JSON/other) format

• Custom or automatic object-to-serialized format mapping

• Needs format description/mapping/annotation or not

• Writes object scheme (e.g. field names) with each object, once per request, once per session, never at all, or some mixed approach

• Writes object fields or bean properties

• Serializes object graph with shared objects or not

• Performance- CPU consumption to serialize/deserialize

- Network consumption (bytes per object)

Page 10: Java Serialization Facts and Fallacies

Java built-in serialization lib?

• Cross-language or Java-only

• Binary or text (XML/JSON/other) format

• Custom or automatic object-to-serialized format mapping

• Needs format description/mapping/annotation or not

• Writes object scheme (e.g. field names) with each object, once per request, once per session, never at all, or some mixed approach

• Writes object fields or bean properties

• Serializes object graph with shared objects or not

• Performance- CPU consumption to serialize/deserialize

- Network consumption (bytes per object)

Page 11: Java Serialization Facts and Fallacies

Serious serialization lib choice diagram

Cross-platform data transfer?

Java to JS?

YES

Jackson/gson/etc

do further research

Use built-in Java Serialization

Do you have problems with

it?

YES NO

YES NO

NO

Page 12: Java Serialization Facts and Fallacies

Serialization myth #1

I’ve found this framework called XYZ…

… it has really simple and fast serialization

All you have to do is to implement this interface! Easy!

I’ve tested it! It is really fast! It outperforms everything else I’ve tried!

/* This is almost actual real-life conversation. * Names changed */

Page 13: Java Serialization Facts and Fallacies

In fact this is corollary to “The obvious fact of life”

• Of course it is the fastest possible way! But this does not solve the most complex serialization problems- You can implement java.io.Externalizable if you are up to writing a

custom serialization code for each object.

• Maintenance cost/efforts are often underestimated- How are you planning to evolve your system? Add new fields? New

classes? • What will keep you from forgetting to read/write them? • Ah… yes, you write a lot of extra tests in addition to writing your

write/read methods to make sure your serialization works

- How about on-the-wire compatibility of old code and new one? • But I deploy all of my system with a new version at once!• But how do you work with it during development?

Page 14: Java Serialization Facts and Fallacies

Serialization myth #2

java.io.Serializable …

“This interface limits how its implementing classes can change in the future. By implementing Serializable you expose your flexible in-memory implementation details as a rigid binary representation. Simple code changes--like renaming private fields--are not safe when the changed class is serializable.”

http://developer.android.com/reference/java/io/Serializable.html

Page 15: Java Serialization Facts and Fallacies

So what about this code evolution?

evolve

write

read

got

Page 16: Java Serialization Facts and Fallacies

serialVersionUID

• serialVersionUID = crazy hash of your class- If you change your class -> serialVersionUID changes -> incompatible

• But you can set serialVersionUID as a “private static final long” field in your class to fix it.

• With a fixed serialVersionUID java serialization supports:- Adding/removing fields without any additional hurdles and annotations

- Moving fields around in class definition

- Extracting/inlining super-classes (but without moving serializable fields up/down class hierarchy).

• It basically means you can evolve your class at will, just a you would do with key-value map in json/other-untyped-system

Page 17: Java Serialization Facts and Fallacies

serialver

• That’s the tool to compute serialVersionUID as some crazy hash of your class- You only need it when your class is “in the wild” without explicitly

assigned serialVersionUID and you want to be backwards compatible with that version of it

• DO NOT USE serialver for freshly created classes- It is a pseudo random number that serves no purpose, but to make

your gzipped class files and gzipped serial files larger

- Just set serialVersionUID = 0• It is no worse, but better than any other value

Page 18: Java Serialization Facts and Fallacies

So what about this code evolution (2)?

evolve

wrie

readgot

symbol – as writtenquantity – as writtenprice – as writtenorderType – null

Profit! It works the other way around, too!

Page 19: Java Serialization Facts and Fallacies

What if I want to make my class incompatible?

• You have two choices- serialVersionUID++

- Rename class

• So, having serialVersionUID in a modern serialization framework is an optional feature- They just did not have refactorings back then in 1996, so renaming a

class was not an option for them

- They chose a default of “your changes break serialization” and force you to explicitly declare serialVersionUID if you want “your changes are backwards compatible”

• I would have preferred the other default, but having to manually add serialVersionUID is a small price to pay for a serialization facility that you have out-of-the box in Java.

Page 20: Java Serialization Facts and Fallacies

Complex changes made compatible

• Java serialization has a bunch of features to support radical evolution of serializable classes in backwards-compatible way:- writeReplace/readResolve

• Lets you migrate your code to a completely new class, but still use the old class for serialization.

- putFields/writeFields/readFieldsLets you completely redo the fields of the object, but still be compatible with the old set of fields for serialization.

It really helps if you are working on, evolving, improving, and refactoring really big projects

Page 21: Java Serialization Facts and Fallacies

What about “renaming private fields is unsafe”?

• It is just a matter of developer’s choice on what actually constitutes “serial form” of the class:- It’s private fields

• like in Java serialization

- It’s public API with its getter/setter methods • aka “Bean Serialization”• many serialization frameworks follow this approach• good match when you need to expose public API via serialization

- A totally separate code, file, or annotation defines serial form• you have to write it in addition to your class• more work, but the approach is the most flexible and pays of in

case of highly heterogeneous system (many platforms/languages)

Page 22: Java Serialization Facts and Fallacies

http://www.organisationscience.com/styled-6/

Page 23: Java Serialization Facts and Fallacies

Serialization myth #3

Serialization performance matters

Page 24: Java Serialization Facts and Fallacies

In practice

• It really matters when very verbose text formats like XML are used in a combination with not-so-fast serializer implementations

• Most binary formats are fast and compact enough, so that they do not constitute a bottleneck in the system- Data is usually expensive to acquire (compute, fetch from db, etc)

- Otherwise (if data is in cache) and the only action of the system is to retrieve it from cache and send, then cache serialized data in binary form, thus making serialization performance irrelevant

aka “marshalled object”

Page 25: Java Serialization Facts and Fallacies

Popular myth #4

Java's standard serialization is known to be slow, and to use a huge

amount of bytes on disk

© Stack Overflow authors, emphasis is mine

Page 26: Java Serialization Facts and Fallacies

A popular way to benchmark serialization

https://github.com/eishay/jvm-serializers/blob/master/tpc/src/serializers/JavaBuiltIn.java

Page 27: Java Serialization Facts and Fallacies

The real fact about serialization performance

• ObjectOutputStream/ObjectInputStream are expensive to create- >1KB of internal buffers allocated

- It matters if you really do a lot of serialization

• You can reuse OOS/OIS instances if you need- It is easy when you write a single stream of values

• Just keep doing writeObject(!)• That is what it was designed for• The added benefit is that class descriptors are written just once

- It is slightly tricky if you want to optimize it for one time writes• The actual gain is modest and depends on the kinds of objects

you are serializing (their size, your heap/gc load, etc)

Page 28: Java Serialization Facts and Fallacies

The real fact about serialization performance (2)

• Each ObjectOutputStream writes class descriptor on the first encounter- reset() clears internal handles table that keep both written objects (for

object graph serialization) and written class descriptors• BEWARE: All written objects are retained until you call reset()

• Writing class descriptor is a signification portion of both CPU time consumption and resulting space- That is the price you pay for all the automagic class evolution features

without having to manually define field ids, create mappings, or write custom code

- The data itself is relatively fast to write and has straightforward binary format (4 bytes per int, etc)

Page 29: Java Serialization Facts and Fallacies

Know what you pay for

Page 30: Java Serialization Facts and Fallacies

Beware of benchmarks

• A typical benchmark compares different serialization frameworks doing tasks of absolutely different complexity

• With Java serialization facility you necessary pay for- No hassle object-to-serial format mapping

• (no writing of interface descriptions, no code generation)

- Support for wide range of class evolution scenarios

- Support for object graph writes

- Support for machines with different byte-endianness

• Most significant performance/size improvements are gained by forfeiting one of these qualities- Be sure to understand what exactly you forfeit

Page 31: Java Serialization Facts and Fallacies

Real cons of Java serialization

• Just one reset() to rule them all- No way to “resetObjects()” to clear all message-level object graph info

(referenced objects), but keep session-level information (written class descriptors) to avoid repetition when keeping a network session to recipient open

• Anyone knows ins-and-outs of JEP process to improve it?

• Can use some performance improvement - Some obviously unnecessary allocations of internal objects can be

removed, code simplified, streamlined• I’m looking for help in “contributing to JDK” area

- But do remember, that it rarely matters in practice

Page 32: Java Serialization Facts and Fallacies

Summary

• Java serialization is there out-of-the box

• It handles most Java-to-Java serialization needs well- While providing a wide range of goodies

• It provides non-compact, but binary representation- It works especially good, if you have collections of repeated business

objects

• It covers a wide range of class evolution scenarios- Set serialVersionUID = 0 to be happy

• It has decent performance that does not often show on profiler’s radar in practice

• We use it in Devexperts for all our Java-to-Java non-market-data

Page 33: Java Serialization Facts and Fallacies

We create professional financial software for

Brokerage and Exchanges since 2002

46000Users on-line in one day

305400Sent orders in one day

99.99%Our software indexof trouble-free operation

365 7 24We support our productsaround-the-clock350

Peoples in our team

Page 34: Java Serialization Facts and Fallacies

Headquarters

197110, 10/1 Barochnaya st.

Saint Petersburg, Russia

+7 812 438 16 26

[email protected]

www.devexperts.com

Page 35: Java Serialization Facts and Fallacies

Thank you! Questions?

Contact me by email: elizarov at devexperts.com Read my blog: http://elizarov.livejournal.com/