plumbr case study

23
TECHNICAL OBSTACLES WHEN BUILDING PLUMBR Nikita Salnikov-Tarnovski Monday, April 1, 13

Upload: nikita-salnikov-tarnovski

Post on 21-Aug-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 2: Plumbr case study

AGENDA

Who we were and who we are

Object lifecycle with little overhead

Graph analysis in low memory

The problem of quitting

Monday, April 1, 13

Page 3: Plumbr case study

OUR BACKGROUND

2 developers

Nikita Salnikov-Tarnovski, @iNikem

Vladimir Šor, @vovencij

10+ years in custom software house Nortal

Mostly Java EE development

Web sites, backend systems, batch processes

Monday, April 1, 13

Page 4: Plumbr case study

NEW PROBLEM

Memory leaks

130,000 monthly searches for OutOfMemoryError in Google

20,000 monthly unique visitors on our site

http://plumbr.eu

400 monthly downloads

1700+ leaks discovered

Monday, April 1, 13

Page 5: Plumbr case study

PLUMBR

Automated performance consultant

Giving you the exact location of the leak with enough information to fix it

The foundation is based on machine learning

trained on 500,000 memory snapshots

From 3,000 different applications

Finding 88% of the existing leaks.

Quality only going up with the additional data gathered each day.

Monday, April 1, 13

Page 6: Plumbr case study

PLUMBR AGENT...

JVM TI agents

both java and native, OS specific

welcome malloc and free!

JNI code for communication between them

Monday, April 1, 13

Page 7: Plumbr case study

... WATCHES YOU

We monitor object creation and disposal

On-the-fly bytecode instrumentation

Hooks into GC events

Monday, April 1, 13

Page 8: Plumbr case study

OBJECT MONITORING I

Java agent registers java.lang.instrument.ClassFileTransformer

Modifies bytecode as classes are loaded

Using ASM library

To capture all newly created objects

Monday, April 1, 13

Page 9: Plumbr case study

PROBLEMS

Different compilers produce slightly different bytecode

Some classes are too fragile or broken already

new and chain of <init>

Clone, deserialization, reflection

Monday, April 1, 13

Page 10: Plumbr case study

OBJECT MONITORING II

We keep some data about each live object

That data creation and association takes time

On every object creation!

Monday, April 1, 13

Page 11: Plumbr case study

OBJECT MONITORING II

If you cannot do in-process, do it off-process

Monday, April 1, 13

Page 12: Plumbr case study

PROBLEMS

BlockingQueue are slow

Locks are slow

Atomic* are slow!

No existing library

Even Disruptor doesn’t suite

We’ve written no-guarantee-lock-free-many-producers-one-consumer buffer

Concurrent programming IS hard

Monday, April 1, 13

Page 13: Plumbr case study

MORE PROBLEMS

Have to store all that objects related data somewhere

Java Collections are too fat

No lock-free thread-safe reading

We use Trove to save memory

Hand-written clone with dirty check

Testing persistent immutable data structures

Monday, April 1, 13

Page 14: Plumbr case study

LEAK HUNTING

When leaks are detected we need to find out, who is holding them

Paths to GC roots

While application is still running

Monday, April 1, 13

Page 15: Plumbr case study

PROBLEMS

Java objects have no incoming refs

You can walk the heap in C code

But that stops the world

Standard heap dump loses information

So we make custom heap dump

And traverse reference graph on it

Monday, April 1, 13

Page 16: Plumbr case study

STILL PROBLEMS

We’ve tried many graph traversal libraries

And NoSQL solutions

All somewhat works

If you give them gigs of memory

But we have to do this on-site, while application is still running

We needed memory sensitive solution

Monday, April 1, 13

Page 17: Plumbr case study

ONE MORE BICYCLE

We’ve written our own specialized version of Dijkstra path searching

Again had to replace many Java Collections with more memory efficient implementations

Monday, April 1, 13

Page 18: Plumbr case study

TIME TO DIE

Plumbr runs inside JVM alongside with an application

It isn’t the main actor, just a supporter

So Plumbr must be ready to quit whenever main application wishes

Monday, April 1, 13

Page 19: Plumbr case study

WHEN JVM QUITS

It turns out JVM is quite survivable

No shutdown notification or smth

It just quits when there are no more non-daemon threads

And some threads live for far too long

Monday, April 1, 13

Page 20: Plumbr case study

Plumbr’s own threads

Threads from libraries that Plumbr uses

ExecutorService with daemon thread factory

PROBLEMS

Monday, April 1, 13

Page 21: Plumbr case study

RMI Reaper Thread

Keeps JVM alive as long as some JMX resources are in use

We must clean behind ourselves, MBeans, JMX connections, JMX servers

But when???

Implemented our own monitor thread with some heuristics

PROBLEMS

Monday, April 1, 13

Page 22: Plumbr case study

Earlier versions used some Swing components, e.g. Systray icon

And JVM will not quit while there is some displayable Swing components

Should kill it when before quitting

Again, when???

PROBLEMS

Monday, April 1, 13

Page 23: Plumbr case study

Don’t spend all your time writing web components or web-services or Swing

There is more to Java than that

There are many Java libraries but not enough

CONCLUSION

Monday, April 1, 13