java 8 parallel stream internals (part 3)schmidt/cs891f/2018-pdfs/09...2 •understand parallel...

54
Java 8 Parallel Stream Internals (Part 3) Douglas C. Schmidt [email protected] www.dre.vanderbilt.edu/~schmidt Professor of Computer Science Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA

Upload: others

Post on 05-May-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

Java 8 Parallel Stream Internals

(Part 3)

Douglas C. [email protected]

www.dre.vanderbilt.edu/~schmidt

Professor of Computer Science

Institute for Software

Integrated Systems

Vanderbilt University

Nashville, Tennessee, USA

Page 2: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

2

• Understand parallel stream internals, e.g.

• Know what can change & what can’t

• Partition a data source into “chunks”

• Process chunks in parallel via thecommon fork-join pool

Learning Objectives in this Part of the Lesson

join join

join

Processsequentially

Processsequentially

Processsequentially

Processsequentially

InputString1.1 InputString1.2 InputString2.1 InputString2.2

InputString1 InputString2

trySplit()

InputString

trySplit() trySplit()

See www.ibm.com/developerworks/library/j-java-streams-3-brian-goetz

Page 3: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

3

• Understand parallel stream internals, e.g.

• Know what can change & what can’t

• Partition a data source into “chunks”

• Process chunks in parallel via thecommon fork-join pool

• Recognize how the common fork-join pool is implemented

Learning Objectives in this Part of the Lesson

See gee.cs.oswego.edu/dl/papers/fj.pdf

Page 4: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

4

Processing Chunks in Parallel via the Common

ForkJoinPool

Page 5: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

5

• Chunks created by a spliterator are processed in the common fork-join pool

Fork-Join Pool

See gee.cs.oswego.edu/dl/papers/fj.pdf

Processing Chunks in Parallel via the Common ForkJoinPool

Page 6: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

6

• A fork-join pool provides a high performance, fine-grained task execution framework for Java data parallelism

See docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html

Processing Chunks in Parallel via the Common ForkJoinPool

Page 7: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

7

• A fork-join pool provides a high performance, fine-grained task execution framework for Java data parallelism

• It provides a parallel computing engine for many higher-level frameworks

See www.infoq.com/interviews/doug-lea-fork-join

filter(not(this::urlCached))

collect(toFuture())

map(this::downloadImageAsync)

flatMap(this::applyFiltersAsync)

collect(toList())

Parallel Streams

filter(not(this::urlCached))

map(this::downloadImage)

flatMap(this::applyFilters)

Completable Futures

ForkJoinPool

Processing Chunks in Parallel via the Common ForkJoinPool

Page 8: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

8

• ForkJoinPool implements the Executor Service interface

See docs.oracle.com/javase/tutorial/essential/concurrency/executors.html

Processing Chunks in Parallel via the Common ForkJoinPool

Page 9: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

9

• ForkJoinPool implements the Executor Service interface

• A ForkJoinPool executes ForkJoinTasks

See docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinTask.html

Processing Chunks in Parallel via the Common ForkJoinPool

Page 10: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

10

• ForkJoinPool implements the Executor Service interface

• A ForkJoinPool executes ForkJoinTasks

• ForkJoinTask associates a chunk of data along with a computation on that datato enable fine-grained parallelism

See www.dre.vanderbilt.edu/~schmidt/PDF/DataParallelismInJava.pdf

Processing Chunks in Parallel via the Common ForkJoinPool

Page 11: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

11

• A ForkJoinTask is lighter weight than a Java thread

Thread

ForkJoinTask

e.g., it omits its own run-time stack, registers, thread-local storage, etc.

Processing Chunks in Parallel via the Common ForkJoinPool

Page 12: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

12

• A ForkJoinTask is lighter weight than a Java thread

• A large # of ForkJoinTasksthus run in a small # of Java threads in a ForkJoinPool

ForkJoinTasks

See www.infoq.com/interviews/doug-lea-fork-join

Processing Chunks in Parallel via the Common ForkJoinPool

Page 13: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

13

Sub-Task1.1

• Parallel streams are a “user friendly” ForkJoinPool façade

See espressoprogrammer.com/fork-join-vs-parallel-stream-java-8

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Sub-Task3.3

Sub-Task3.4

Deque Deque Deque

map(phrase -> searchForPhrase(…))

filter(not(SearchResults::isEmpty))

collect(toList())

45,000+ phrases

Search Phrases

Processing Chunks in Parallel via the Common ForkJoinPool

Page 14: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

14

• Parallel streams are a “user friendly” ForkJoinPool façade

• You can program directly to the ForkJoinPool API, though it can be somewhat painful!

List<List<SearchResults>>

listOfListOfSearchResults =

ForkJoinPool.commonPool()

.invoke(new

SearchWithForkJoinTask

(inputList,

mPhrasesToFind, ...));

I gave you the

chance of

programming

Java 8 streams

But you have

elected the

way of pain!

Processing Chunks in Parallel via the Common ForkJoinPool

Page 15: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

15

• Parallel streams are a “user friendly” ForkJoinPool façade

• You can program directly to the ForkJoinPool API, though it can be somewhat painful!

See SearchStreamGang/src/main/java/livelessons/streamgangs/SearchWithForkJoin.java

Use the common fork-join pool to search input strings

for phrases that match

45,000+ phrases

Search Phrases

Input Strings to Search

List<List<SearchResults>>

listOfListOfSearchResults =

ForkJoinPool.commonPool()

.invoke(new

SearchWithForkJoinTask

(inputList,

mPhrasesToFind, ...));

Processing Chunks in Parallel via the Common ForkJoinPool

Page 16: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

16

• Parallel streams are a “user friendly” ForkJoinPool façade

• You can program directly to the ForkJoinPool API, though it can be somewhat painful!

• Best used for algorithms that don’t match Java 8’s parallel streams programming model

See www.oracle.com/technetwork/articles/java/fork-join-422606.html

Long compute() {

long count = 0L;

List<RecursiveTask<Long>> forks =

new LinkedList<>();

for (Folder sub : mFolder.getSubs()){

FolderSearchTask task = new

FolderSearchTask(sub, mWord);

forks.add(task); task.fork();

}

for (Doc doc : mFolder.getDocs()) {

DocSearchTask task =

new DocSearchTask(doc, mWord);

forks.add(task); task.fork();

}

for (RecursiveTask<Long> task : forks)

count = count + task.join();

return count;

}

Processing Chunks in Parallel via the Common ForkJoinPool

Page 17: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

17

• All parallel streams in a process share the common fork-join pool

See dzone.com/articles/common-fork-join-pool-and-streams

Processing Chunks in Parallel via the Common ForkJoinPool

Page 18: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

18

• All parallel streams in a process share the common fork-join pool

• Helps optimize resource utilizationby knowing what cores are being used globally within a process

See dzone.com/articles/common-fork-join-pool-and-streams

Processing Chunks in Parallel via the Common ForkJoinPool

Page 19: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

19

• All parallel streams in a process share the common fork-join pool

• Helps optimize resource utilization by knowing what cores are being used globally within a process

• There are (intentionally) few “knobs” to control this (or any) fork-join pool

Processing Chunks in Parallel via the Common ForkJoinPool

See www.youtube.com/watch?v=sq0MX3fHkro

Page 20: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

20

• All parallel streams in a process share the common fork-join pool

• Helps optimize resource utilization by knowing what cores are being used globally within a process

• There are (intentionally) few “knobs” to control this (or any) fork-join pool

• Contrast with the ThreadPoolExecutorframework

Processing Chunks in Parallel via the Common ForkJoinPool

Page 21: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

21

• All parallel streams in a process share the common fork-join pool

• Helps optimize resource utilization by knowing what cores are being used globally within a process

• There are (intentionally) few “knobs” that control this fork-join pool

• You can configure the pool size

Processing Chunks in Parallel via the Common ForkJoinPool

See Part 4 of this lesson for details

System.setProperty

("java.util.concurrent"

+ ".ForkJoinPool.common"

+ ".parallelism",

10);

Desired # threads

Page 22: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

22

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 23: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

23

• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool

Mapping Parallel Streams onto the Java Fork-Join Pool

abstract class AbstractTask ... { ...

public void compute() {

Spliterator<P_IN> rs = spliterator, ls;

boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){

K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); }

else

{ forkRight = true; ... taskToFork = ...makeChild(ls); }

taskToFork.fork();

}

} ...

See openjdk/8-b132/java/util/stream/AbstractTask.java

Abstract base class for most fork-join tasks used to implement stream ops

Page 24: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

24

• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool

Mapping Parallel Streams onto the Java Fork-Join Pool

abstract class AbstractTask ... { ...

public void compute() {

Spliterator<P_IN> rs = spliterator, ls;

boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){

K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); }

else

{ forkRight = true; ... taskToFork = ...makeChild(ls); }

taskToFork.fork();

}

} ...

Decides whether to split a task further or compute it directly

Page 25: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

25See docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html#trySplit

• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool

Mapping Parallel Streams onto the Java Fork-Join Pool

abstract class AbstractTask ... { ...

public void compute() {

Spliterator<P_IN> rs = spliterator, ls;

boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){

K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); }

else

{ forkRight = true; ... taskToFork = ...makeChild(ls); }

taskToFork.fork();

}

} ...

Try to partition the input source until trySplit() returns null

Page 26: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

26

• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool

Mapping Parallel Streams onto the Java Fork-Join Pool

abstract class AbstractTask ... { ...

public void compute() {

Spliterator<P_IN> rs = spliterator, ls;

boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){

K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); }

else

{ forkRight = true; ... taskToFork = ...makeChild(ls); }

taskToFork.fork();

}

} ...

Alternative which child is forked to avoid biased spliterators

Page 27: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

27

• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool

Mapping Parallel Streams onto the Java Fork-Join Pool

abstract class AbstractTask ... { ...

public void compute() {

Spliterator<P_IN> rs = spliterator, ls;

boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){

K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); }

else

{ forkRight = true; ... taskToFork = ...makeChild(ls); }

taskToFork.fork();

}

} ...

Fork off a new sub-task & continue processing the other in the loop

See docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinTask.html#fork

Page 28: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

28

• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool

Mapping Parallel Streams onto the Java Fork-Join Pool

abstract class AbstractTask ... { ...

public void compute() {

Spliterator<P_IN> rs = spliterator, ls;

boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){

...

}

task.setLocalResult(task.doLeaf());

} ...

This method typically calls forEachRemaining() to process elements in the stream sequentially

See docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html#forEachRemaining

Page 29: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

29

• Each worker thread in the common fork-join pool runs a loop scanningfor parallel stream tasks to run

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 30: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

30

• Each worker thread in the common fork-join pool runs a loop scanningfor parallel stream tasks to run

• Goal is to keep worker threads & cores as busy as possible!

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 31: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

31

• A worker thread has a “double-ended queue” (aka “deque”) that serves as its main source of tasks

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

See en.wikipedia.org/wiki/Double-ended_queue

Sub-Task1.4

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 32: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

32

• A worker thread has a “double-ended queue” (aka “deque”) that serves as its main source of tasks

• Implemented by WorkQueue

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

See java8/util/concurrent/ForkJoinPool.java

Sub-Task1.4

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 33: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

33

• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

See gee.cs.oswego.edu/dl/papers/fj.pdf

2.push()

1.fork()

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 34: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

34

• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque

• Each worker thread processes its deque in LIFO order

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

2.pop()

1.join()

See en.wikipedia.org/wiki/Stack_(abstract_data_type)

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 35: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

35

• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque

• Each worker thread processes its deque in LIFO order

• A task pop’d from the head of a deque is run to completion

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

2.pop()

1.join()

See en.wikipedia.org/wiki/Run_to_completion_scheduling

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 36: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

36

• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque

• Each worker thread processes its deque in LIFO order

• A task pop’d from the head of a deque is run to completion

• join() “pitches in” to pop& execute (sub-)tasks

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

2.pop()

1.join()

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 37: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

37

• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque

• Each worker thread processes its deque in LIFO order

• A task pop’d from the head of a deque is run to completion

• join() “pitches in” to pop& execute (sub-)tasks

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

2.pop()

1.join()

“Collaborative Jiffy Lube” model of processing!

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 38: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

38

• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque

• Each worker thread processes its deque in LIFO order

• LIFO order improves locality of reference & cache performance

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

See en.wikipedia.org/wiki/Locality_of_reference

2.pop()

1...

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 39: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

39

• Worker threads only block if no parallel stream tasks are available for them to run

Mapping Parallel Streams onto the Java Fork-Join PoolWorkQueueWorkQueue WorkQueue

Page 40: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

40See Doug Lea’s talk at www.youtube.com/watch?v=sq0MX3fHkro

Mapping Parallel Streams onto the Java Fork-Join Pool• Worker threads only block if no

parallel stream tasks are available for them to run

• Blocking worker threads & cores are costly on modern processors

WorkQueueWorkQueue WorkQueue

Page 41: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

41

• Worker threads only block if no parallel stream tasks are available for them to run

• Blocking worker threads & cores are costly on modern processors

• Each worker thread thereforechecks other deques in thepool to find other tasks to run

Mapping Parallel Streams onto the Java Fork-Join Pool

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

Page 42: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

42

• To maximize core utilization, idle worker threads “steal” work from the tail of busy threads’ deques

See docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Sub-Task1.1

Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

poll()

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 43: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

43

• To maximize core utilization, idle worker threads “steal” work from the tail of busy threads’ deques

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Sub-Task1.1

Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

poll()

A worker thread deque to steal from is selected randomly to lower contention

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 44: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

44

• Tasks are stolen in FIFO order

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Sub-Task1.1

Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

See en.wikipedia.org/wiki/FIFO_(computing_and_electronics)

poll()

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 45: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

45

• Tasks are stolen in FIFO order, e.g.

• Minimizes contention withthread owning the deque

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Sub-Task1.1

Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

See www.ibm.com/support/knowledgecenter/en/SS3KLZ/com.ibm.java.diagnostics.healthcenter.doc/topics/resolving.html

poll()

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 46: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

46

• Tasks are stolen in FIFO order, e.g.

• Minimizes contention withthread owning the deque

• An older stolen task may yielda larger unit of work due to theway in which spliterators work

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Sub-Task1.1

Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

poll()

Mapping Parallel Streams onto the Java Fork-Join Pool

List<String>1.1 List<String>1.2

List<String>1 List<String>2

trySplit()

List<String>

trySplit()

List<String>2.1 List<String>2.2

trySplit()

Page 47: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

47

• Tasks are stolen in FIFO order, e.g.

• Minimizes contention withthread owning the deque

• An older stolen task may yielda larger unit of work due to theway in which spliterators work

• Enables further recursive decompositions by the stealing thread

Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.1.1

Sub-Task1.1.2

Sub-Task1.1.3

Sub-Task1.1.4

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 48: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

48See www.dre.vanderbilt.edu/~schmidt/PDF/work-stealing-deque.pdf

• The WorkQueue deque that implements work-stealing minimizes locking contention

poll()

push()

pop()

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 49: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

49

• The WorkQueue deque that implements work-stealing minimizes locking contention

• push() & pop() are only called by the owning worker thread

poll()

push()

pop()

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 50: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

50

• The WorkQueue deque that implements work-stealing minimizes locking contention

• push() & pop() are only called by the owning worker thread

• These methods use wait-free “compare-and-swap” (CAS) operations

poll()

push()

pop()

See en.wikipedia.org/wiki/Compare-and-swap

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 51: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

51

• The WorkQueue deque that implements work-stealing minimizes locking contention

• push() & pop() are only called by the owning worker thread

• poll() may be called from another worker thread to “steal” a task

poll()

push()

pop()

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 52: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

52

• The WorkQueue deque that implements work-stealing minimizes locking contention

• push() & pop() are only called by the owning worker thread

• poll() may be called from another worker thread to “steal” a task

• May not always be wait-free

poll()

push()

pop()

See gee.cs.oswego.edu/dl/papers/fj.pdf

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 53: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

53

• The WorkQueue deque that implements work-stealing minimizes locking contention

• push() & pop() are only called by the owning worker thread

• poll() may be called from another worker thread to “steal” a task

• May not always be wait-free

• See “Implementation Overview” comments in the ForkJoinPoolsource code for details..

poll()

push()

pop()

See java8/util/concurrent/ForkJoinPool.java

Mapping Parallel Streams onto the Java Fork-Join Pool

Page 54: Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel stream internals, e.g. •Know what can change & what can’t •Partition a data source

54

End of Java 8 Parallel Stream Internals (Part 3)