stream processing in go

18
1 Stream Processing In Go Khosrow Afroozeh Sunil Sayyaparaju

Upload: kafroozeh

Post on 15-Jan-2017

1.033 views

Category:

Software


0 download

TRANSCRIPT

1

Stream Processing In Go

Khosrow AfroozehSunil Sayyaparaju

2

Streams are the Norm● Need for BusinessAnalyticsgenerates endlessstreams of data

● HorizontalScaling adds tothe number ofstreams

● Stream variety ison the rise

● Streams need tobe composed andco-processed

3

Stream

●Arrays●Slices●Channels●Buffers●Files●Database Queries●...

4

Stream Elements

No Generics In Go, so stream elements are boxedobjects:

interface{}

● There is no type-safety for generic streamprocessing.

● Not a big deal really, Schemaless datasourcesreturn interfaces anyway.

● It can be easily managed by runtime type-checking in the first step of the pipeline.

5

Classic Collections

6

Traditional Compositions 1

stream 1 <Record>

stream

2 <Cl

oud>

stream1.Join(stream2).Filter(...)

API InterfaceProblem

7

Traditional Compositions 2

stream 1 <Record>

stream

2 <Cl

oud>

Join(stream1, stream2)

Lots of Gophers Needed forPipelining, Signature Problem

Still Unsolved

Filter(stream3, ...)

stream3

8

Problem

● Don’t want to code1 unlessabsolutely necessary

● Don’t want to repeat ourselves● More code leads to more maintenanceand testing

1 not on company hours at least! YMMV.

9

Abstraction Goals

● Data processing should be decoupledfrom data structures.

● Compositions should happen on data, not datastructures.

Note: <T> denotes type. This is not valid Gocode.

Note: f and m are functions, e.g:

f(value interface{}) bool m(value interface{}) interface{}

10

Abstraction Goals Cont’d

● Data should not be transportedduring transformation, unlessnecessary.

11

Transducers1

1 Idea inspired by Clojure. Fair enough, they got inspired by channels ;)

12

Transducers Impl.

13

Reducer

● Responsible for chaining of the pipeline:

stream → t1 → t2 → … → tn → reducer → result

14

Transducers Impl. Example

15

Transduction

● Flush is used when some function in thechain would like to eject the operation.

● When all the data in the stream has beenprocessed or a flush has been requested,method Complete() is called to capturethe states in the stateful reducers.

Chain of functions call eachother:

f, m => m(f(val))

16

Example

17

Observations● Cons

– No compile-time type safety– Tricky to parallelize

● Pros– Fewer Go-routines for long pipelines– Fewer synchronizations For channels– Potentially uses less memory– Decoupled processing logic from data structures– Better compositions– More readable

18

Thank You

Khosrow Afroozeh:

● @parshua

[email protected]

Sunil Sayyaparajou

[email protected]