overview of java 8 parallel streams (part 2)schmidt/cs891f/2018-pdfs/... · 6 •a java 8 parallel...

24
Overview of Java 8 Parallel Streams (Part 2) Douglas C. Schmidt [email protected] www.dre.vanderbilt.edu/~schmidt Professor of Computer Science Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA

Upload: others

Post on 16-Jan-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

Overview of Java 8 Parallel Streams

(Part 2)

Douglas C. [email protected]

www.dre.vanderbilt.edu/~schmidt

Professor of Computer Science

Institute for Software

Integrated Systems

Vanderbilt University

Nashville, Tennessee, USA

Page 2: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

2

Learning Objectives in this Part of the Lesson• Know how aggregate operations & functional programming features are

applied in the parallel streams framework

• Be aware of how a parallel stream works “under the hood”

join joinjoin

Processsequentially

Processsequentially

Processsequentially

Processsequentially

DataSource1.1 DataSource1.2 DataSource2.1 DataSource2.2

DataSource1 DataSource2

DataSource

trySplit()

trySplit() trySplit()

See www.ibm.com/developerworks/library/j-java-streams-3-brian-goetz

Page 3: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

3

• Know how aggregate operations & functional programming features are applied in the parallel streams framework

• Be aware of how a parallel stream works “under the hood”

• Recognize now to avoid concurrencyhazards in parallel streams

Learning Objectives in this Part of the Lesson

Aggregate operation (Function f)

Input x

Output f(x)

Output g(f(x))

Output h(g(f(x)))

Aggregate operation (Function g)

Aggregate operation (Function h)

Page 4: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

4

Overview of How a Parallel Stream Works

Page 5: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

5

• A Java 8 parallel stream implementsa “map/reduce” variant optimized for multi-core processors

See en.wikipedia.org/wiki/MapReduce

Overview of How a Parallel Stream Works

Reduce

Map

Partition

Page 6: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

6

• A Java 8 parallel stream implementsa “map/reduce” variant optimized for multi-core processors

• It’s actually a three phase“split-apply-combine” data processing strategy

See www.jstatsoft.org/article/view/v040i01

join joinjoin

Processsequentially

Processsequentially

Processsequentially

Processsequentially

DataSource1.1 DataSource1.2 DataSource2.1 DataSource2.2

DataSource1 DataSource2

DataSource

trySplit()

trySplit() trySplit()

Overview of How a Parallel Stream Works

Page 7: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

7

• The split-apply-combine phases are:

1. Split – Recursively partition a data source into independent“chunks”

CollectionData1.1 CollectionData1.2 CollectionData2.1 CollectionData2.2

CollectionData1 CollectionData2

CollectionData

See en.wikipedia.org/wiki/Divide_and_conquer_algorithm

trySplit()

trySplit() trySplit()

Overview of How a Parallel Stream Works

Page 8: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

8

• The split-apply-combine phases are:

1. Split – Recursively partition a data source into independent“chunks”

• Spliterators are definedto partition collections in Java 8

CollectionData1.1 CollectionData1.2 CollectionData2.1 CollectionData2.2

CollectionData1 CollectionData2

CollectionData

See docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html

trySplit()

trySplit() trySplit()

Overview of How a Parallel Stream Works

Page 9: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

9

• The split-apply-combine phases are:

1. Split – Recursively partition a data source into independent“chunks”

• Spliterators are definedto partition collections in Java 8

• You can also define custom spliterators

InputString1.1 InputString1.2 InputString2.1 InputString2.2

InputString1 InputString2

InputString

See github.com/douglascraigschmidt/LiveLessons/tree/master/SearchStreamSpliterator

45,000+ phrases

Search Phrases

Input Strings to Search

trySplit()

trySplit() trySplit()

Overview of How a Parallel Stream Works

Page 10: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

10

• The split-apply-combine phases are:

1. Split – Recursively partition a data source into independent“chunks”

• Spliterators are definedto partition collections in Java 8

• You can also define custom spliterators

• Parallel streams perform better on data sources that can be split efficiently & evenly

InputString1.1 InputString1.2 InputString2.1 InputString2.2

InputString1 InputString2

InputString

See www.airpair.com/java/posts/parallel-processing-of-io-based-data-with-java-streams

trySplit()

trySplit() trySplit()

Overview of How a Parallel Stream Works

Page 11: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

11

Processsequentially

Processsequentially

Processsequentially

Processsequentially

• The split-apply-combine phases are:

1. Split – Recursively partition a data source into independent“chunks”

2. Apply – Process chunksindependently in one(common) thread pool

InputString1.1 InputString1.2 InputString2.1 InputString2.2

InputString1 InputString2

InputString

Splitting & applying run simultaneously (after certain limit met), not sequentially

Overview of How a Parallel Stream Works

Page 12: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

12

Processsequentially

Processsequentially

Processsequentially

Processsequentially

• The split-apply-combine phases are:

1. Split – Recursively partition a data source into independent“chunks”

2. Apply – Process chunksindependently in one(common) thread pool

• Programmers have somecontrol over how manythreads are in the pool

InputString1.1 InputString1.2 InputString2.1 InputString2.2

InputString1 InputString2

InputString

Overview of How a Parallel Stream Works

Page 13: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

13

• The split-apply-combine phases are:

1. Split – Recursively partition a data source into independent“chunks”

2. Apply – Process chunksindependently in one(common) thread pool

3. Combine – Join partial results into a single result

join join

join

Processsequentially

Processsequentially

Processsequentially

Processsequentially

InputString1.1 InputString1.2 InputString2.1 InputString2.2

InputString1 InputString2

InputString

Overview of How a Parallel Stream Works

Page 14: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

14

• The split-apply-combine phases are:

1. Split – Recursively partition a data source into independent“chunks”

2. Apply – Process chunksindependently in one(common) thread pool

3. Combine – Join partial results into a single result

• Performed by terminal operations like collect() & reduce()

join join

join

Processsequentially

Processsequentially

Processsequentially

Processsequentially

See www.codejava.net/java-core/collections/java-8-stream-terminal-operations-examples

InputString1.1 InputString1.2 InputString2.1 InputString2.2

InputString1 InputString2

InputString

Overview of How a Parallel Stream Works

Page 15: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

15

Avoiding Concurrency Hazards in Java 8 Parallel Streams

Page 16: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

16

• The Java 8 parallel streams framework assumes behaviors don’t incur race conditions

See en.wikipedia.org/wiki/Race_condition#Software

Avoiding Concurrency Hazards in Java 8 Parallel Streams

Race conditions arise when an app depends on the sequence or timing of threads for it to operate properly

Aggregate operation (Function f)

Input x

Output f(x)

Output g(f(x))

Output h(g(f(x)))

Aggregate operation (Function g)

Aggregate operation (Function h)

Page 17: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

17See docs.oracle.com/javase/tutorial/collections/streams/parallelism.html#side_effects

Avoiding Concurrency Hazards in Java 8 Parallel Streams

map(phrase -> searchForPhrase(…))

filter(not(SearchResults::isEmpty))

collect(toList())

Input Strings to Search

45,000+ phrases

Search Phrases

parallelStream()

• Parallel streams should therefore avoid behaviors with side-effects

Page 18: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

18

• Parallel streams should therefore avoid behaviors with side-effects, e.g.

• Stateful lambda expressions

• Where results depend on sharedmutable state

Avoiding Concurrency Hazards in Java 8 Parallel Streamsclass BuggyFactorial {

static class Total {

long mTotal = 1;

void multiply(long n)

{ mTotal *= n; }

}

static long factorial(long n){

Total t = new Total();

LongStream

.rangeClosed(1, n)

.parallel()

.forEach(t::multiply);

return t.mTotal;

} ...

See docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#Statelessness

Page 19: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

19

• Parallel streams should therefore avoid behaviors with side-effects, e.g.

• Stateful lambda expressions

• Where results depend on sharedmutable state

• i.e., state that may change inparallel execution of a pipeline

Avoiding Concurrency Hazards in Java 8 Parallel Streamsclass BuggyFactorial {

static class Total {

long mTotal = 1;

void multiply(long n)

{ mTotal *= n; }

}

static long factorial(long n){

Total t = new Total();

LongStream

.rangeClosed(1, n)

.parallel()

.forEach(t::multiply);

return t.mTotal;

} ...

Page 20: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

20

• Parallel streams should therefore avoid behaviors with side-effects, e.g.

• Stateful lambda expressions

• Where results depend on sharedmutable state

• i.e., state that may change inparallel execution of a pipeline

Avoiding Concurrency Hazards in Java 8 Parallel Streams

Race conditions can arise due to the unsynchronized access to mTotal field

See github.com/douglascraigschmidt/LiveLessons/tree/master/Java8/ex16

class BuggyFactorial {

static class Total {

long mTotal = 1;

void multiply(long n)

{ mTotal *= n; }

}

static long factorial(long n){

Total t = new Total();

LongStream

.rangeClosed(1, n)

.parallel()

.forEach(t::multiply);

return t.mTotal;

} ...

Page 21: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

21

• Parallel streams should therefore avoid behaviors with side-effects, e.g.

• Stateful lambda expressions

• Interference w/the data source

• Occurs when source of stream is modified within the pipeline

See docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#NonInterference

Avoiding Concurrency Hazards in Java 8 Parallel StreamsList<Integer> list = IntStream

.range(0, 10)

.boxed()

.collect(toList());

list

.parallelStream()

.peek(list::remove)

.forEach(System.out::println);

Page 22: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

22

• Parallel streams should therefore avoid behaviors with side-effects, e.g.

• Stateful lambda expressions

• Interference w/the data source

• Occurs when source of stream is modified within the pipeline

Avoiding Concurrency Hazards in Java 8 Parallel StreamsList<Integer> list = IntStream

.range(0, 10)

.boxed()

.collect(toList());

list

.parallelStream()

.peek(list::remove)

.forEach(System.out::println);

Aggregate operations enable parallelism with non-thread-safe collections provided the collection is not modified while it’s being operated on..

See github.com/douglascraigschmidt/LiveLessons/tree/master/Java8/ex11

Page 23: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

23

• Java 8 lambda expressions & method references containing no shared state are useful for parallel streams since they needn’t be explicitly synchronized

See henrikeichenhardt.blogspot.com/2013/06/why-shared-mutable-state-is-root-of-all.html

return mList.size() == 0;

return new SearchResults

(Thread.currentThread().getId(),

currentCycle(), phrase, title,

StreamSupport

.stream(new PhraseMatchSpliterator

(input, phrase),

parallel)

.collect(toList()));

Avoiding Concurrency Hazards in Java 8 Parallel Streams

map(phrase -> searchForPhrase(…))

filter(not(SearchResults::isEmpty))

collect(toList())

45,000+ phrases

Search Phrases

parallelStream()

Page 24: Overview of Java 8 Parallel Streams (Part 2)schmidt/cs891f/2018-PDFs/... · 6 •A Java 8 parallel stream implements a “map/reduce” variant optimized for multi-core processors

24

End of Overview of Java 8 Parallel Streams

(Part 2)