java eightification of eclipse collections · technology division 25 framework comparisons features...
TRANSCRIPT
TECHNOLOGY DIVISION
1
Java Eightification of Eclipse CollectionsEclipseCon Europe 201626 Oct
Alex IlievGS.com/Engineering
TECHNOLOGY DIVISION
2
2
● Introductions
● Eclipse Collections outline
● Version 8 for Java 8– Approach applies to other libraries too
● Parallel implementation comparison
Program
TECHNOLOGY DIVISION
3
3
● Currently developing Web frameworks and toolkits for GS client applications.
● Previously built the in-house authorization
system. Lots of Eclipse Collections there.
● Previously, PhD on Computer Security
Who am I?
TECHNOLOGY DIVISION
4
4
What/Why is Eclipse Collections?
● Java Collections library
● Extensive Collection types– Functional API style
– Inspired by Smalltalk
● Many collection implementations
● Efficient: lean and fast
● Comes with a training Kata
TECHNOLOGY DIVISION
5
5
● Started in 2004– To reduce memory footprint of small collections
● Support for higher-order functions inspired by Smalltalk collections protocol
● Open-sourced as GS Collections in 2012
● Eclipse Collections in 2015
A Brief History
TECHNOLOGY DIVISION
6
Impls● Array-list● Hash-set● Hash-map● Derived:
● Bag● Multimaps● Bimap
Interfaces● List, Set, Map● Bag, Multimaps, Bimap
Functions● collect map● select filter● injectInto fold● reduce● flatCollect flatMap, concatMap
● partition● aggregateBy mapReduce● chunk● zip
● Mutable● Immutable
● Eager● Lazy● Parallel
● Object● Primitive
TECHNOLOGY DIVISION
7
7
ImmutableList<Position> positionsEC = Lists.immutable.of();MutableSet<Account> bigAccounts = positionsEC.asLazy() .select(a -> a.product().price() > 10_000) .collect(Position::account) .toSet();
Quick Code Peek
TECHNOLOGY DIVISION
8
8
● EC had a functional / HoF API long before Java 8 brought that to JDK
● EC worked with Java 8
● But did not require or embrace it
● V 8 changes that. Now:– Java 8 is required
– Using Java 8 Streams and EC together is easier
Interplay with Java 8
TECHNOLOGY DIVISION
9
9
List<Integer> is = asList(1,2,3,4);
List<Integer> doubled = is.stream() .map(i -> i*2) .collect(Collectors.toList());
Java 8 Streams in a Snippet
TECHNOLOGY DIVISION
10
10
● Approach applies to other libraries too
● Have library’s function objects extend JDK FunctionalInterfaces
● Provide Stream Collectors which feed into the library
● Support existing Collectors operating on library’s structures
Java 8 Integration Outline
TECHNOLOGY DIVISION
11
11
package org.eclipse.collections.api.block.function;
@FunctionalInterfacepublic interface Function<T, V> extends java.util.function.Function<T, V>, Serializable
public interface Function0<R> extends Supplier<R>, Serializable
public interface Function2<T1, T2, R> extends BiFunction<T1, T2, R>, Serializable
EC users can pass their function objects to JDK APIs
Functional interfaces extend JDK’s
TECHNOLOGY DIVISION
12
12
package org.eclipse.collections.api.block.function.primitive;
@FunctionalInterfacepublic interface IntFunction<T> extends java.util.function.ToIntFunction<T>, Serializable
package org.eclipse.collections.api.block.predicate;@FunctionalInterfacepublic interface Predicate<T> extends java.util.function.Predicate<T>, Serializable
Functional interfaces extend JDK’s
TECHNOLOGY DIVISION
13
Collectors2
Primitive Collections
● groupBy● toListMultimap● toImmutableBagMultimap
Multimaps
● sumToLong ● sumToDouble● collectBoolean● collectLong
Most types
● toImmutableList● toSet● toSortedSet● toBiMap● toImmutableBag● toStack
MutableList<Pair<T,U>>MutableList< ObjectIntPair<T>>
● zip● zipWithIndex
R extends Collection<T>
● collect● select● selectWith● reject
Misc
● chunk● partition● makeString
TECHNOLOGY DIVISION
14
14
List<Position> positionsJdk = asList(...);
Collecting2
TECHNOLOGY DIVISION
15
15
MutableSetMultimap<String, Position> byCategory = positionsJdk .stream().collect(Collectors2.groupBy( p -> p.product().category(), Multimaps.mutable.set::empty ));
MutableBag<Position> bigOnes = positionsJdk .stream().collect(Collectors2.select( p -> p.product().price() > 10000, Bags.mutable::empty ));
Collecting2
TECHNOLOGY DIVISION
16
16
● Mirroring Collectors.summarizingDouble() etc.
● RichIterable.– summarizeLong
– summarizeDouble
– …
● Currently only serial
RichIterable → SummaryStatistics
TECHNOLOGY DIVISION
17
17
● detectOptional
● reduce
Optional<T> Methods
TECHNOLOGY DIVISION
18
18
● Same result as Stream.collect()● So JDK 8 Collectors can be used:
– On EC RichIterables which don’t implement Collection
– Directly without needing to make a Stream
● Currently only serial
RichIterable.reduceInPlace()
TECHNOLOGY DIVISION
19
19
Very simple implementation.
– Parallel would be on a different type
public interface RichIterable<T> {
default
<R, A> R reduceInPlace(Collector<? super T, A, R> collector) { A intermResult = collector.supplier().get();
BiConsumer<A, ? super T> accumulator = collector.accumulator(); this.each(x -> accumulator.accept(intermResult, x));
return collector.finisher().apply(intermResult);
RichIterable.reduceInPlace()
TECHNOLOGY DIVISION
20
20
● EC offer complete parallel execution options
● On separate types rooted at ParallelIterable
public interface ListIterable<T> {
ParallelListIterable<T> asParallel( ExecutorService executorService, int batchSize);
public interface SetIterable<T> extends RichIterable<T> {
ParallelSetIterable<T> asParallel( ExecutorService executorService, int batchSize);
Parallel in EC
TECHNOLOGY DIVISION
21
21
● Uses a simple parallel decomposition strategy, single-level
AbstractParallelIterable<T, B extends Batch<T>>
implements ParallelIterable<T> {
public abstract LazyIterable<B> split();
– Then use a normal Executor
– Take care to minimise merge overhead, eg. CompositeFastList
● Often faster than JDK parallel streams
● Flexibility with Executor and batch size, can tune for CPU and I/O tasks
EC Parallel Implementation
TECHNOLOGY DIVISION
22
22
● More general parallel decomposition strategy: recursive two-way splits
public interface Spliterator<T> { Spliterator<T> trySplit();
– Then use a Fork-Join Executor
– Very general approach, applies to any data structure
– But higher overhead for common cases
– Not tunable for high-latency I/O tasks
JDK Streams Parallel Implementation
TECHNOLOGY DIVISION
23
23
● ParallelIterable does not include reduceInPlace yet
● Implementation would follow the current ParallelIterable approach:– Split the input into roughly equal batches
– Run each batch as a standard Executor task
– Merge the resulting accumulations
● Optimize based on Collector Characteristics
EC Parallel reduceInPlace
TECHNOLOGY DIVISION
24
24
More Information
http://www.eclipse.org/collections/
github.com/eclipse/eclipse-collections
github.com/eclipse/eclipse-collections/wiki
github.com/eclipse/eclipse-collections-kata
Parallel-lazy Performance: Java 8 vs Scala vs GS Collections
infoq.com/presentations/java-streams-scala-parallel-collections
GS Collections write-optimized concurrent map
infoq.com/presentations/Fine-Grained-Parallelism
GS Collections Memory Benchmarkgoldmansachs.com/gs-collections/presentations/GSC_Memory_Tests.pdf
TECHNOLOGY DIVISION
25
Framework Comparisons
Features Eclipse Collections
Java 8 Guava Trove Scala
Rich API
Interfaces Readable, Mutable, Immutable, FixedSize, Lazy
Mutable, Stream
Mutable, Fluent
Mutable Readable, Mutable, Immutable, Lazy
Optimized Set & Map (+Bag)
Immutable Collections
Primitive Collections (+Bag, +Immutable)
Multimaps (+Bag, +SortedBag)
(+Linked) (Multimap trait)
Bags (Multisets)
BiMaps
Iteration Styles Eager/Lazy,Serial/Parallel
Lazy,Serial/Parallel
Lazy,Serial
Eager,Serial
Eager/Lazy, Serial/Parallel (Lazy Only)
Learn more at GS.com/Engineering
© 2016 Goldman Sachs. This presentation reflects information available to the Technology Division of Goldman Sachs only and not any other part of Goldman Sachs. It should not be relied upon or considered investment advice. Opinions expressed may not be those of Goldman Sachs unless otherwise expressly noted. Goldman, Sachs & Co. (“GS”) does not warrant or guarantee the accuracy, completeness or efficacy of this presentation, and recipients should not rely on it except at their own risk. This presentation may not be forwarded or otherwise disclosed except with this disclaimer intact.