flink london meetup 3 march 2016 - flink basics

24
Motivation

Upload: cyrus-new

Post on 14-Apr-2017

165 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Flink London meetup 3 March 2016 - Flink basics

Motivation

Page 2: Flink London meetup 3 March 2016 - Flink basics

TheEvolutionofMassive-ScaleDataProcessingTylerAkidau,StaffSoftwareEngineer@Googlehttps://goo.gl/5k0xaL

Page 3: Flink London meetup 3 March 2016 - Flink basics

TheEvolutionofMassive-ScaleDataProcessingTylerAkidau,StaffSoftwareEngineer@Googlehttps://goo.gl/5k0xaL

Page 4: Flink London meetup 3 March 2016 - Flink basics

We’renotthenewest!

Page 5: Flink London meetup 3 March 2016 - Flink basics

APACHEFLINKLONDONMEETUP3rdMarch2016|BonhillHouse,London

Page 6: Flink London meetup 3 March 2016 - Flink basics

Whatwe’llcovertoday

¨  Hand-waveybit¨  Practicalbit¨  Textbookbit

Page 7: Flink London meetup 3 March 2016 - Flink basics

Part1:Thehand-waveybit

¨  Aim:¤ MakesureweallhavesamebasicunderstandingofwhatFlinkis

¤  Introducekeyconceptsn Notexhaustiven Notexplainingmuch!

Page 8: Flink London meetup 3 March 2016 - Flink basics

WTFisFlink?

Page 9: Flink London meetup 3 March 2016 - Flink basics

Flinkbasics…

¨  ApacheFoundationtoplevelopensourceproject…¨  …fordistributeddataprocessing…¨  …witha“streamingfirst”architecture…¨  …runningontheJVM.Or:A‘free’waytoprocessalotofdata(especiallystreamingdata)on‘commodity’hardware,withacodebasethatiscontinuallyimproving.Usefulforreporting,analytics,logprocessing,machinelearning,etc.

Page 10: Flink London meetup 3 March 2016 - Flink basics

Somekeyterms

¨  DataStreamApossiblyunboundedimmutablecollectionofdataitemsofthesametype

¨  DataSetAnabstractrepresentationofafiniteimmutablecollectionofdataofthesametypethatmaycontainduplicates

¨  SourceCanbefile-based,socket-based,collection-based,Custom(e.g.Kaea)

¨  SinkConsumesDataSets/DataStreamsandforwardsthemtofiles,sockets,externalsystems,orprintsthem

¨  OperatorRepresentsanoperation(oradataprocessingstep)inthe‘JobGraph’–includespropertiesliketheactualcodeanddesiredparallelism.

Page 11: Flink London meetup 3 March 2016 - Flink basics

Applicationarchitecture

Page 12: Flink London meetup 3 March 2016 - Flink basics

Flink‘skeleton’programstructure

DataStream1.  Obtaina

StreamExecutionEnvironment

2.  Connecttodatastreamsources

3.  Specifytransformationsonthedatastreams

4.  Specifyoutputfortheprocesseddata

5.  Executetheprogram[env.execute()]

DataSet1.  Obtainan

ExecutionEnvironment2.  Load/createtheinitial

data3.  Specifytransformations

onthedata4.  Specifywheretoput

results5.  Executetheprogram

[env.execute(), print(), collect()]

Page 13: Flink London meetup 3 March 2016 - Flink basics

(infuturemeetupsGuestSpeakerswillgiveusthejuicydetails…)

KeyFlinkfeatures

Page 14: Flink London meetup 3 March 2016 - Flink basics

High Performance

Support for out-of-order events

Low latency

Exactly-once semantics

Flexible streaming windows

One runtime for stream & batch /

ecosystem

Back pressure

Delta iterate operators

Page 15: Flink London meetup 3 March 2016 - Flink basics

One runtime for stream & batch /

ecosystem

Delta iterate operators

High Performance

Support for out-of-order events

Low latency

Exactly-once semantics

Flexible streaming windows

Back pressure

AccordingtotheApacheFlinksite

(http://flink.apache.org/)

Page 16: Flink London meetup 3 March 2016 - Flink basics

Highperformance/Lowlatency

Highthroughput

Lowlatency

Page 17: Flink London meetup 3 March 2016 - Flink basics

Flowcontrolandbackpressure

¨  Backpressurebottleneck:‘pressure’buildingupbecausedataisarrivingfasterthanitcanbeprocessed.¤ Temporaryprocessslow-down(e.g.GConJVM)¤ Temporarytrafficspike

¨  “Flinkachievesthemaximumthroughputallowedbytheslowestpartofthepipeline”¤ Notaconfigurable‘feature’¤  Inherentinarchitecture(buffer-based)

Page 18: Flink London meetup 3 March 2016 - Flink basics

Exactly-oncesemanticsforstate

¨  Intheeventoffailure“Pickupwhereyouleftoff”.¤  Meansyouneedtorememberwhereyouleftoff(dataandstate)

¨  3levelsofriskappetite:¤  L1–Acceptmisses(“Atmostonce”)¤  L2–Acceptduplicates(“Atleastonce”)¤  L3–Don’taccepteither(“Exactlyonce”)

¨  Checkpointing/snapshots¤  Dependentonstreamsource–e.g.Kaea¤  Orchestrationistricky(seenextslide)

Page 19: Flink London meetup 3 March 2016 - Flink basics

Checkpointingorchestration

Page 20: Flink London meetup 3 March 2016 - Flink basics

SupportforOut-of-Orderevents

¨  Reallife:messageswillbedelayed

¨  Everyeventistime-stamped¨  It’sharderthanitsounds(‘kinds’oftime,windows,watermarks,etc)

t1t2t3t5t6t7t4t8

Page 21: Flink London meetup 3 March 2016 - Flink basics

Highlyflexiblestreamingwindows

Thestartandendofthedatastreamthatisbeingprocessed.¨  Differentwaystodefinethewindow,including:

¤  Time(from9:00:00to9:00:04)¤  Count(fromitem12toitem18)¤  Session(fromfirst‘keyedevent’untilwedon’tseesame

keyforXtime–analogoustocookiesession)¤ Morecomplexlogicdrivenbythedata,andmore

complexwindowsdependingonwhatisneeded

Page 22: Flink London meetup 3 March 2016 - Flink basics

(Delta)iterateoperators

Iterateoperator

Deltaiterateoperator

Workon‘hot’Don’ttouch‘cold’

Page 23: Flink London meetup 3 March 2016 - Flink basics

(Delta)iterateoperators

Page 24: Flink London meetup 3 March 2016 - Flink basics

Oneruntime/libraryecosystem

NB:•  Librariesinbeta•  APIsinJava,Scala,[Python]•  FlinkCEPtoo?