java collections the force awakens

56
Java Collections The Force Awakens Darth @RaoulUK Darth @RichardWarburto #javaforceawakens

Upload: richardwarburton

Post on 23-Jan-2017

981 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Java collections  the force awakens

Java CollectionsThe Force Awakens

Darth @RaoulUKDarth @RichardWarburto#javaforceawakens

Page 2: Java collections  the force awakens

Evolution can be interesting ...Java 1.2 Java 10?

Page 3: Java collections  the force awakens

Collection API Improvements

Persistent & Immutable Collections

Performance Improvements

Page 4: Java collections  the force awakens

Collection bugs

1. Element access (Off-by-one error, ArrayOutOfBound)2. Concurrent modification 3. Check-then-Act

Page 5: Java collections  the force awakens

Scenario 1

List<String> jedis = new ArrayList<>(asList("Luke", "yoda"));

for (String jedi: jedis) {

if (Character.isLowerCase(jedi.charAt(0))) {

jedis.remove(jedi);

}

}

Page 6: Java collections  the force awakens

Scenario 2

Map<String, BigDecimal> movieViews = new HashMap<>();

BigDecimal views = movieViews.get(MOVIE);

if(views != null) {

movieViews.put(MOVIE, views.add(BigDecimal.ONE));

}

views != nullmoviesViews.get movieViews.putThen

Check Act

Page 7: Java collections  the force awakens

Reducing scope for bugs

● ~280 bugs in 28 projects including Cassandra, Lucene

● ~80% check-then-act bugs discovered are put-if-absent

● Library designers can help by updating APIs as new idioms emerge

● Different data structures can provide alternatives by restricting reads & updates to reduce scope for bugs

CHECK-THEN-ACT Misuse of Java Concurrent Collectionshttp://dig.cs.illinois.edu/papers/checkThenAct.pdf

Page 8: Java collections  the force awakens

Java 9 API updates

Collection factory methods● Non-goal to provide persistent immutable collections● http://openjdk.java.net/jeps/269

Live Demo using jShellhttp://iteratrlearning.com/java9/2016/11/09/java9-collection-factory-methods

Page 9: Java collections  the force awakens

Collection API Improvements

Persistent & Immutable Collections

Performance Improvements

Page 10: Java collections  the force awakens

Categorising Collections

Mutable

Immutable

Non-Persistent Persistent

Unsynchronized Concurrent

Unmodifiable View

Available in Core Library

Page 11: Java collections  the force awakens

Mutable

● Popular friends include ArrayList, HashMap, TreeSet

● Memory-efficient modification operations

● State can be accidentally modified

● Can be thread-safe, but requires careful design

Page 12: Java collections  the force awakens

Unmodifiable

List<String> jedis = new ArrayList<>();

jedis.add("Luke Skywalker");

List<String> cantChangeMe = Collections.unmodifiableList(jedis);

// java.lang.UnsupportedOperationException

//cantChangeMe.add("Darth Vader");

System.out.println(cantChangeMe); // [Luke Skywalker]

jedis.add("Darth Vader");

System.out.println(cantChangeMe); // [Luke Skywalker, Darth Vader]

Page 13: Java collections  the force awakens
Page 14: Java collections  the force awakens

Immutable & Non-persistent

● No updates

● Flexibility to convert source in a more efficient representation

● No locking in context of concurrency

● Satisfies co-variant subtyping requirements

● Can be copied with modifications to create a new version (can be

expensive)

Page 15: Java collections  the force awakens

Immutable vs. Mutable hierarchy

ImmutableList MutableList

+ ImmutableList<T> toImmutable()

java.util.List

+ MutableList<T> toList()

Eclipse Collections (formaly GSCollections) https://projects.eclipse.org/projects/technology.collections/

ListIterable

Page 16: Java collections  the force awakens

Immutable and Persistent

● Changing source produces a new (version) of the collection

● Resulting collections shares structure with source to avoid full copying on updates

Page 17: Java collections  the force awakens

LISP anyone?

Page 18: Java collections  the force awakens

Persistent List (aka Cons)

public final class Cons<T> implements ConsList<T> {

private final T head;

private final ConsList<T> tail;

public Cons(T head, ConsList<T> tail) {

this.head = head; this.tail = tail;

}

@Override

public ConsList<T> add(T e) {

return new Cons(e, this);

}

}

Page 19: Java collections  the force awakens

Updating Persistent List

A B C X Y Z

Before

Page 20: Java collections  the force awakens

Updating Persistent List

A B C X Y Z

Before

A B D

After

Blue nodes indicate new copiesPurple nodes indicates nodes we wish to update

Page 21: Java collections  the force awakens

Concatenating Two Persistent Lists

A B C

X Y Z

Before

Page 22: Java collections  the force awakens

Concatenating Two Persistent Lists

- Poor locality due to pointer chasing- Copying of nodes

A B C

X Y Z

Before

A B C

After

Page 23: Java collections  the force awakens

Persistent List

● Structural sharing: no need to copy full structure

● Poor locality due to pointer chasing

● Copying becomes more expensive with larger lists

● Poor Random Access and thus Data Decomposition

Page 24: Java collections  the force awakens

Updating Persistent Binary Tree

Before

Page 25: Java collections  the force awakens

Updating Persistent Binary Tree

After

Page 26: Java collections  the force awakens

Persistent Array

How do we get the immutability benefits with performance of mutable variants?

Page 27: Java collections  the force awakens

Trieroot

10 4520

3. Picking the right branch is done by using parts of the key as a lookup

1. Branch factor not limited to binary

2. Leaf nodes contain actual values

a

a e

bc

b c f

Page 28: Java collections  the force awakens

Persistent Array (Bitmapped Vector Trie)... ...

... ...

... ...

... ...

.

.

.

.

.

.

1 31

0 1 31

Level 1 (root)

Level 2

Leaf nodes

Page 29: Java collections  the force awakens

Trade-offs

● Large branching factor facilitates iteration but hinders updates

● Small branching factor facilitates updates but hinders traversal

Page 30: Java collections  the force awakens

Java Persistent Collections

- Not available as part of Java Core Library

- Existing projects includes- PCollections: https://github.com/hrldcpr/pcollections- Port of Clojure DS: https://github.com/krukow/clj-ds- Port of Scala DS: https://github.com/andrewoma/dexx- Now also in Javaslang: http://javaslang.io

Page 31: Java collections  the force awakens

Memory usage survey

10,000,000 elements, heap < 32GB

int[] : 40MBInteger[]: 160MBArrayList<Integer>: 215MBPersistentVector<Integer>: 214MB (Clojure-DS)Vector<Integer>: 206MB (Dexx, port of Scala-DS)

Data collected using Java Object Layout: http://openjdk.java.net/projects/code-tools/jol/

Page 32: Java collections  the force awakens

Takeaways

● Immutable collections reduce the scope for bugs

● Always a compromise between programming safety and performance

● Performance of persistent data structure is improving

Page 33: Java collections  the force awakens

Collection API Improvements

Persistent & Immutable Collections

Performance Improvements

Page 34: Java collections  the force awakens
Page 35: Java collections  the force awakens

O(N)

O(1)

O(HYPERSPACE)

Page 36: Java collections  the force awakens

Primitive specialised collections

● Collections often hold boxed representations of primitive values

● Java 8 introduced IntStream, LongStream, DoubleStream and

primitive specialised functional interfaces

● Other libraries, eg: Agrona, Koloboke and Eclipse-Collections provide

primitive specialised collections today.

● Valhalla investigates primitive specialised generics

Page 37: Java collections  the force awakens

Java 8 Lazy Collection Initialization

Many allocated HashMaps and ArrayLists never written to, eg Null object pattern

Java 8 adds Lazy Initialization for the default initialization case

Typically 1-2% reduction in memory consumption

http://www.javamagazine.mozaicreader.com/MarApr2016/Twitter#&pageSet=28&page=0

Page 38: Java collections  the force awakens
Page 39: Java collections  the force awakens

HashMaps Basics

...

Han Solohash = 72309

Chewbaccahash = 72309

Page 40: Java collections  the force awakens

Chaining Probing

HashMaps

a separate data structure for collision lookups

Store inline and have a probing sequence

Page 41: Java collections  the force awakens

Aliases: Palpatine vs Darth Sidious

Page 42: Java collections  the force awakens

Chaining Probing

HashMaps

aka Closed Addressing

aka Open Hashing

aka Open Addressing

aka Closed Hashing

Page 43: Java collections  the force awakens

Chaining Probing

HashMaps

Linked List Based Tree Based

Page 44: Java collections  the force awakens

java.util.HashMap

Chaining Based HashMap

Historically maintained a LinkedList in the case of a collision

Problem: with high collision rates that the HashMap approaches O(N) lookup

Page 45: Java collections  the force awakens

java.util.HashMap in Java 8

Starts by using a List to store colliding values.

Trees used when there are over 8 elements

Tree based nodes use about twice the memory

Make heavy collision lookup case O(log(N)) rather than O(N)

Relies on keys being Comparable

https://github.com/RichardWarburton/map-visualiser

Page 46: Java collections  the force awakens

So which HashMap is best?

Page 47: Java collections  the force awakens

Example Jar-Jar Benchmark

call get() on a single value for a map of size 1

No model of the different factors that affect things!

Page 48: Java collections  the force awakens

Tree Optimization - 60% Collisions

Page 49: Java collections  the force awakens

Tree Optimization - 10% Collisions

Page 50: Java collections  the force awakens

Probing vs Chaining

Probing Maps usually have lower memory consumption

Small Maps: Probing never has long clusters, can be up to 91% faster.

In large maps with high collision rates, probing scales poorly and can be significantly slower.

Page 51: Java collections  the force awakens

Takeaways

There’s no clearcut “winner”.

JDK Implementations try to minimise worst case.

Linear Probing requires a good hashCode() distribution, Often hashmaps “precondition” their hashes.

IdentityHashMap has low memory consumption and is fast, use it!

3rd Party libraries offer probing HashMaps, eg Koloboke & Eclipse-Collections.

Page 52: Java collections  the force awakens

Conclusions

Page 53: Java collections  the force awakens
Page 54: Java collections  the force awakens

Any Questions?

www.iteratrlearning.com

● Modern Development with Java 8● Reactive and Asynchronous Java● Java Software Development Bootcamp

#javaforceawakens

Page 55: Java collections  the force awakens

Further reading

Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrayshttps://infoscience.epfl.ch/record/64410/files/techlists.pdf

Smaller Footprint for Java Collectionshttp://www.lirmm.fr/~ducour/Doc-objets/ECOOP2012/ECOOP/ecoop/356.pdf

Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collectionshttp://michael.steindorfer.name/publications/oopsla15.pdf

RRB-Trees: Efficient Immutable Vectorshttps://infoscience.epfl.ch/record/169879/files/RMTrees.pdf

Page 56: Java collections  the force awakens

Further reading

Doug Lea’s Analysis of the HashMap implementation tradeoffshttp://www.mail-archive.com/[email protected]/msg02147.html

Java Specialists HashMap article

http://www.javaspecialists.eu/archive/Issue235.html

Sample and Benchmark Codehttps://github.com/RichardWarburton/Java-Collections-The-Force-Awakens