reactive streams: handling data-flow the reactive way

60
Dr. Roland Kuhn Akka Tech Lead @rolandkuhn Reactive Streams Handling Data-Flows the Reactive Way

Upload: roland-kuhn

Post on 27-Aug-2014

4.752 views

Category:

Software


1 download

DESCRIPTION

Building on the success of Reactive Extensions—first in Rx.NET and now in RxJava—we are taking Observers and Observables to the next level: by adding the capability of handling back-pressure between asynchronous execution stages we enable the distribution of stream processing across a cluster of potentially thousands of nodes. The project defines the common interfaces for interoperable stream implementations on the JVM and is the result of a collaboration between Twitter, Netflix, Pivotal, RedHat and Typesafe. In this presentation I introduce the guiding principles behind its design and show examples using the actor-based implementation in Akka.

TRANSCRIPT

Page 1: Reactive Streams: Handling Data-Flow the Reactive Way

Dr. Roland Kuhn Akka Tech Lead @rolandkuhn

Reactive StreamsHandling Data-Flows the Reactive Way

Page 2: Reactive Streams: Handling Data-Flow the Reactive Way

Introduction: Streams

Page 3: Reactive Streams: Handling Data-Flow the Reactive Way

What is a Stream?

• ephemeral flow of data • focused on describing transformation • possibly unbounded in size

3

Page 4: Reactive Streams: Handling Data-Flow the Reactive Way

Common uses of Streams

• bulk data transfer • real-time data sources • batch processing of large data sets • monitoring and analytics

4

Page 5: Reactive Streams: Handling Data-Flow the Reactive Way

What is special about Reactive Streams?

Page 6: Reactive Streams: Handling Data-Flow the Reactive Way

Reactive  Applications

The Four Reactive Traits

6

http://reactivemanifesto.org/

Page 7: Reactive Streams: Handling Data-Flow the Reactive Way

Needed: Asynchrony

• Resilience demands it: • encapsulation • isolation

• Scalability demands it: • distribution across nodes • distribution across cores

7

Page 8: Reactive Streams: Handling Data-Flow the Reactive Way

Many Kinds of Async Boundaries

8

Page 9: Reactive Streams: Handling Data-Flow the Reactive Way

Many Kinds of Async Boundaries

• between different applications

8

Page 10: Reactive Streams: Handling Data-Flow the Reactive Way

Many Kinds of Async Boundaries

• between different applications• between network nodes

8

Page 11: Reactive Streams: Handling Data-Flow the Reactive Way

Many Kinds of Async Boundaries

• between different applications• between network nodes• between CPUs

8

Page 12: Reactive Streams: Handling Data-Flow the Reactive Way

Many Kinds of Async Boundaries

• between different applications• between network nodes• between CPUs• between threads

8

Page 13: Reactive Streams: Handling Data-Flow the Reactive Way

Many Kinds of Async Boundaries

• between different applications• between network nodes• between CPUs• between threads• between actors

8

Page 14: Reactive Streams: Handling Data-Flow the Reactive Way

The Problem: !

Getting Data across an Async Boundary

Page 15: Reactive Streams: Handling Data-Flow the Reactive Way

Possible Solutions

10

Page 16: Reactive Streams: Handling Data-Flow the Reactive Way

Possible Solutions

• the Traditional way: blocking calls

10

Page 17: Reactive Streams: Handling Data-Flow the Reactive Way

Possible Solutions

10

Page 18: Reactive Streams: Handling Data-Flow the Reactive Way

Possible Solutions

!

• the Push way: buffering and/or dropping

11

Page 19: Reactive Streams: Handling Data-Flow the Reactive Way

Possible Solutions

11

Page 20: Reactive Streams: Handling Data-Flow the Reactive Way

Possible Solutions

!

!

• the Reactive way:non-blocking & non-dropping & bounded

12

Page 21: Reactive Streams: Handling Data-Flow the Reactive Way

How do we achieve that?

Page 22: Reactive Streams: Handling Data-Flow the Reactive Way

Supply and Demand

• data items flow downstream • demand flows upstream • data items flow only when there is demand • recipient is in control of incoming data rate • data in flight is bounded by signaled demand

14

Publisher Subscriber

data

demand

Page 23: Reactive Streams: Handling Data-Flow the Reactive Way

Dynamic Push–Pull

• “push” behavior when consumer is faster • “pull” behavior when producer is faster • switches automatically between these • batching demand allows batching data

15

Publisher Subscriber

data

demand

Page 24: Reactive Streams: Handling Data-Flow the Reactive Way

Explicit Demand: Tailored Flow Control

16

demand

data

splitting the data means merging the demand

Page 25: Reactive Streams: Handling Data-Flow the Reactive Way

Explicit Demand: Tailored Flow Control

17

merging the data means splitting the demand

Page 26: Reactive Streams: Handling Data-Flow the Reactive Way

Reactive Streams

• asynchronous non-blocking data flow • asynchronous non-blocking demand flow • minimal coordination and contention • message passing allows for distribution

• across applications • across nodes • across CPUs • across threads • across actors

18

Page 27: Reactive Streams: Handling Data-Flow the Reactive Way

Are Streams Collections?

Page 28: Reactive Streams: Handling Data-Flow the Reactive Way

What is a Collection?

20

Page 29: Reactive Streams: Handling Data-Flow the Reactive Way

What is a Collection?

• Oxford Dictionary: • “a group of things or people”

20

Page 30: Reactive Streams: Handling Data-Flow the Reactive Way

What is a Collection?

• Oxford Dictionary: • “a group of things or people”

• wikipedia: • “a grouping of some variable number of data items”

20

Page 31: Reactive Streams: Handling Data-Flow the Reactive Way

What is a Collection?

• Oxford Dictionary: • “a group of things or people”

• wikipedia: • “a grouping of some variable number of data items”

• backbone.js: • “collections are simply an ordered set of models”

20

Page 32: Reactive Streams: Handling Data-Flow the Reactive Way

What is a Collection?

• Oxford Dictionary: • “a group of things or people”

• wikipedia: • “a grouping of some variable number of data items”

• backbone.js: • “collections are simply an ordered set of models”

• java.util.Collection: • definite size, provides an iterator, query membership

20

Page 33: Reactive Streams: Handling Data-Flow the Reactive Way

User Expectations

• an Iterator is expected to visit all elements(especially with immutable collections) • x.head + x.tail == x • the contents does not depend on who is

processing the collection • the contents does not depend on when the

processing happens(especially with immutable collections)

21

Page 34: Reactive Streams: Handling Data-Flow the Reactive Way

Streams have Unexpected Properties

• the observed sequence depends on • … when the observer subscribed to the stream • … whether the observer can process fast enough • … whether the streams flows fast enough

22

Page 35: Reactive Streams: Handling Data-Flow the Reactive Way

Streams are not Collections!

• java.util.stream:Stream is not derived from Collection“Streams differ from Coll’s in several ways” • no storage • functional in nature • laziness seeking • possibly unbounded • consumable

23

Page 36: Reactive Streams: Handling Data-Flow the Reactive Way

Streams are not Collections!

• a collection can be streamed • a stream observer can create a collection • … but saying that a Stream is just a lazy

Collection evokes the wrong associations

24

Page 37: Reactive Streams: Handling Data-Flow the Reactive Way

So, Reactive Streams: why not just java.util.stream.Stream?

Page 38: Reactive Streams: Handling Data-Flow the Reactive Way

Java 8 Stream

26

import java.util.stream.*; !// get some stream final Stream<Integer> s = Stream.of(1, 2, 3); // describe transformation final Stream<String> s2 = s.map(i -> "a" + i); // make a pull collection s2.iterator(); // or alternatively push it somewhere s2.forEach(i -> System.out.println(i)); // (need to pick one, Stream is consumable)

Page 39: Reactive Streams: Handling Data-Flow the Reactive Way

Java 8 Stream

• provides a DSL for describing transformation • introduces staged computation

(but does not allow reuse) • prescribes an eager model of execution • offers either push or pull, chosen statically

27

Page 40: Reactive Streams: Handling Data-Flow the Reactive Way

What about RxJava?

Page 41: Reactive Streams: Handling Data-Flow the Reactive Way

RxJava

29

import rx.Observable; import rx.Observable.*; !// get some stream source final Observable<Integer> obs = range(1, 3); // describe transformation final Observable<String> obs2 = obs.map(i -> "b" + i); // and use it twice obs2.subscribe(i -> System.out.println(i)); obs2.filter(i -> i.equals("b2")) .subscribe(i -> System.out.println(i));

Page 42: Reactive Streams: Handling Data-Flow the Reactive Way

RxJava

• implements pure “push” model • includes extensive DSL for transformations • only allows blocking for back pressure • currently uses unbounded buffering for

crossing an async boundary • work on distributed Observables sparked

participation in Reactive Streams

30

Page 43: Reactive Streams: Handling Data-Flow the Reactive Way

The Reactive Streams Project

Page 44: Reactive Streams: Handling Data-Flow the Reactive Way

Participants

• Engineers from • Netflix • Oracle • Pivotal • Red Hat • Twitter • Typesafe

• Individuals like Doug Lea and Todd Montgomery

32

Page 45: Reactive Streams: Handling Data-Flow the Reactive Way

The Motivation

• all participants had the same basic problem • all are building tools for their community • a common solution benefits everybody • interoperability to make best use of efforts • e.g. use Reactor data store driver with Akka

transformation pipeline and Rx monitoring to drive a vert.x REST API (purely made up, at this point)

33

see also Jon Brisbin’s post on “Tribalism as a Force for Good”

Page 46: Reactive Streams: Handling Data-Flow the Reactive Way

Recipe for Success

• minimal interfaces • rigorous specification of semantics • full TCK for verification of implementation • complete freedom for many idiomatic APIs

34

Page 47: Reactive Streams: Handling Data-Flow the Reactive Way

The Meat

35

trait Publisher[T] { def subscribe(sub: Subscriber[T]): Unit } trait Subscription { def requestMore(n: Int): Unit def cancel(): Unit } trait Subscriber[T] { def onSubscribe(s: Subscription): Unit def onNext(elem: T): Unit def onError(thr: Throwable): Unit def onComplete(): Unit }

Page 48: Reactive Streams: Handling Data-Flow the Reactive Way

The Sauce

• all calls on Subscriber must dispatch async • all calls on Subscription must not block • Publisher is just there to create Subscriptions

36

Page 49: Reactive Streams: Handling Data-Flow the Reactive Way

How does it Connect?

37

SubscriberPublisher

subscribe

onSubscribeSubscription

requestMore

onNext

Page 50: Reactive Streams: Handling Data-Flow the Reactive Way

Akka Streams

Page 51: Reactive Streams: Handling Data-Flow the Reactive Way

Akka Streams

• powered by Akka Actors • execution • distribution • resilience

• type-safe streaming through Actors with bounded buffering

39

Page 52: Reactive Streams: Handling Data-Flow the Reactive Way

Basic Akka Example

40

implicit val system = ActorSystem("Sys") val mat = FlowMaterializer(...) !Flow(text.split("\\s").toVector). map(word => word.toUpperCase). foreach(tranformed => println(tranformed)). onComplete(mat) { case Success(_) => system.shutdown() case Failure(e) => println("Failure: " + e.getMessage) system.shutdown() }

Page 53: Reactive Streams: Handling Data-Flow the Reactive Way

More Advanced Akka Example

41

val s = Source.fromFile("log.txt", "utf-8") Flow(s.getLines()). groupBy { case LogLevelPattern(level) => level case other => "OTHER" }. foreach { case (level, producer) => val out = new PrintWriter(level+".txt") Flow(producer). foreach(line => out.println(line)). onComplete(mat)(_ => Try(out.close())) }. onComplete(mat)(_ => Try(s.close()))

Page 54: Reactive Streams: Handling Data-Flow the Reactive Way

Java 8 Example

42

final ActorSystem system = ActorSystem.create("Sys"); final MaterializerSettings settings = MaterializerSettings.create(); final FlowMaterializer materializer = FlowMaterializer.create(settings, system); !final String[] lookup = { "a", "b", "c", "d", "e", "f" }; final Iterable<Integer> input = Arrays.asList(0, 1, 2, 3, 4, 5); !Flow.create(input).drop(2).take(3). // leave 2, 3, 4 map(elem -> lookup[elem]). // translate to "c","d","e" filter(elem -> !elem.equals("c")). // filter out the "c" grouped(2). // make into a list mapConcat(list -> list). // flatten the list fold("", (acc, elem) -> acc + elem). // accumulate into "de" foreach(elem -> System.out.println(elem)). // print it consume(materializer);

Page 55: Reactive Streams: Handling Data-Flow the Reactive Way

Akka TCP Echo Server

43

val future = IO(StreamTcp) ? Bind(addr, ...) future.onComplete { case Success(server: TcpServerBinding) => println("listen at "+server.localAddress) ! Flow(server.connectionStream).foreach{ c => println("client from "+c.remoteAddress) c.inputStream.produceTo(c.outputStream) }.consume(mat) ! case Failure(ex) => println("cannot bind: "+ex) system.shutdown() }

Page 56: Reactive Streams: Handling Data-Flow the Reactive Way

Akka HTTP Sketch (NOT REAL CODE)

44

val future = IO(Http) ? RequestChannelSetup(...) future.onSuccess { case ch: HttpRequestChannel => val p = ch.processor[Int] val body = Flow(new File(...)).toProducer(mat) Flow(HttpRequest(method = POST, uri = "http://ex.amp.le/42", entity = body) -> 42) .produceTo(mat, p) Flow(p).map { case (resp, token) => val out = FileSink(new File(...)) Flow(resp.entity.data).produceTo(mat, out) }.consume(mat) }

Page 57: Reactive Streams: Handling Data-Flow the Reactive Way

Closing Remarks

Page 58: Reactive Streams: Handling Data-Flow the Reactive Way

Current State

• Early Preview is available:"org.reactivestreams" % "reactive-streams-spi" % "0.2""com.typesafe.akka" %% "akka-stream-experimental" % "0.3" • check out the Activator template

"Akka Streams with Scala!"(https://github.com/typesafehub/activator-akka-stream-scala)

46

Page 59: Reactive Streams: Handling Data-Flow the Reactive Way

Next Steps

• we work towards inclusion in future JDK • try it out and give feedback! • http://reactive-streams.org/ • https://github.com/reactive-streams

47

Page 60: Reactive Streams: Handling Data-Flow the Reactive Way

©Typesafe 2014 – All Rights Reserved