dataflow: the concurrency/parallelism architecture you need

33
@russel_winder #devoxxuk #dataflowrules Copyright © 2014 Russel Winder Dataflow: Russel Winder @russel_winder http://www.russel.org.uk [email protected] The Concurrency/Parallelism Architecture You Need

Upload: russel-winder

Post on 15-Jan-2015

173 views

Category:

Technology


0 download

DESCRIPTION

An informal investigation/tutorial on the dataflow architecture for Java and Groovy as presented at DevoxxUK 2014. Code presented is on GitHub: https://github.com/russel/MeanStdDev.git

TRANSCRIPT

Page 1: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow:

Russel Winder@russel_winder http://[email protected]

The Concurrency/ParallelismArchitecture You Need

Page 2: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

What is Dataflow?

Page 3: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

What are (in computing†):

Concurrency:

Structuring solution and code such that multiple parts may execute independently and possibly even at the same time.

Parallelism:

Execute multiple parts of a system at the same time on different processors so as to get things working faster.

†In natural language these words have very different meanings.

Page 4: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

What is Dataflow?

An architecture comprising channels allowing data to flow from one operator to another, where each operator has multiple input channels and multiple output channels, and executes code only in response to the arrival of data on the inputs.

Page 5: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Historically

Dataflow computers:– Values flowing between…–…operators that calculate…–…new values to pass to…–…other operators.

Dataflow hardware didn't take off, but the architecture works at various scales.

The Manchester Prototype Dataflow Computer J R Gurd, C C Kirkham, I WatsonCACM 28(1), 1985-01.

Page 6: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow diagrams have been anintegral part of analysis and design ofinformation systems since the 1970s

T de Marco, Structured Analysis and Systems Specification,Yourdon Press, NY, 1978.

Page 7: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Page 8: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow and Functional

Operators seem like they might be pure functions, but…

…they are not necessarily, operators may have internal state.

Operators may be referentially transparent, but they may be not.

Operators may even have side effects.

Page 9: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow is anevent-basedarchitecture

Page 10: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow systems are(possibly)

reactive systems.

Which would make them exceedinglytrendy even if the idea is very old.

Page 11: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow systems have

no†

shared memory.

† or at least should have no.

Page 12: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

operatorchannel

Page 13: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow systems aremessage passing systems.

Page 14: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Each operator must†

be single threaded.

† or at least should.

Page 15: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow Frameworks

Scala:–Future

Akka:–Dataflow variables, aka

Promise–Deprecated in favour of Async

Java:–Pre-8, Future–8+, CompletableFuture, aka

Promise

Page 16: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Architectural Issue

Each of the aforementioned frameworks assumes that each operator creates a single value. Communication is by dataflow variables: each dataflow variable is a thread-safe single assignment variable.

Page 17: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Page 18: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Page 19: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

GPars…

Has dataflow variables (promises) and tasks and so can do everything Akka and Java can offer.

Has DataflowQueue, and so can create real dataflow networks.

Page 20: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

One does like to code…

…doesn't one.

Page 21: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

We need a problem…

Page 22: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

A Problem

Calculate mean and standard deviation of a data sample.

x̄ =1n∑i=0

nxi

s = √ 1n−1∑i=0

n(x i− x̄)2

Page 23: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Amend the Problem

s = √ 1n−1 ( (∑i=0

nx i

2 )−n x̄ x̄ )

x̄ =1n∑i=0

nxi

Page 24: Dataflow: the concurrency/parallelism architecture you need

@YourTwitterHandle@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Code

Page 25: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Switch to using an IDE for this.Switch to using an IDE for this.

Code Example

Page 26: Dataflow: the concurrency/parallelism architecture you need

@YourTwitterHandle#DVXFR14{session hashtag} @russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Sum

mar

y

Page 27: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Summary

Dataflow is an architecture:

Event-driven, single-threaded operators communicating by message passing using channels.

Dataflow is an easement:

Synchronization is inherent in the model, and there is no shared memory, so all deadlocks are trivial.

Page 28: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow is a way of harnessingconcurrency and parallelism

in easy to program ways.

Page 29: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

GPars is usable from Javaas well as Groovy.

Page 30: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Testing is really Groovy with Spock.

Page 31: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow is an architecture ofcode you need to know.

Page 32: Dataflow: the concurrency/parallelism architecture you need

@YourTwitterHandle#DVXFR14{session hashtag} @russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Q &

A

Page 33: Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow:

Russel Winder@russel_winder http://[email protected]

The Concurrency/ParallelismArchitecture You Need