[nyjavasig] riding the distributed streams - feb 2nd, 2017

42
This slide blank on purpose

Upload: viktor-gamov

Post on 12-Apr-2017

88 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

This slide blank on purpose

Page 2: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Riding the Distributed Streams

#nyjavasig #hazelcastjet#java8

Page 3: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

> whoami• Solutions Architect

@Hazelcast

• Hang out with awesome people

• @gamussa in internetz

Please, follow me in TwitterI’m very interesting ©

Page 4: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Agenda

• Refreshing knowledge on Java 8 Streams

• Distribute and Conquer

• Distributed Data

• Distributed Streams

• How we did all this

Page 5: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Java 8 Streams

Page 6: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Java 8 Streams…• An abstraction represents a sequence of

elements

• Is not a data structure

• Convey elements from a source through a pipeline of operations

• Operation doesn’t modify a source

Page 7: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Why I should care about Stream API?

• You’re Java developer

Page 8: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

What does regular Java developer think about Scala?advanced

Page 9: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Why I should care about Stream API?

• You’re Java developer

• Many Java developers know Java

• It’s all about data processing

Page 10: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

java.util.stream operations

• map(), flatMap(), filter()

• reduce(), collect()

• sorted()

Intermediate operation

Terminal operation

Blocking operation

Page 11: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
Page 12: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
Page 13: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
Page 14: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Problem

• One does not simply put all Big Data in one machine

Page 15: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Problem

• Data doesn’t fit just one machine

ONE DOES NOT SIMPLY

FIT BIG DATA IN ONE MACHINE

Page 16: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Problem

• One does not simply put all Big Data in one machine

• Data is too important to have it only one machine

Page 17: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

EXCUSE ME,

COULD YOU SPARE A MOMENT TO TALK ABOUT

DATA DISTRIBUTION

Page 18: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

CACHES

REPLICATION

SHARDING

Page 19: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Replication on Sharding?

http://book.mixu.net/distsys/single-page.html

Page 20: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Solution

• Use Distributed Map aka IMap

Page 21: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

What’s Hazelcast IMDG?• In-memory Data Grid• Apache v2 Licensed• Distributed

• Caches (IMap, JCache)• Java Collections (IList, ISet, IQueue)• Messaging (Topic, RingBuffer)• Computation (ExecutorService, M-R)

Page 22: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Scale-Out Computing

Page 23: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Scale-Up Computing

Page 24: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

I DON’T ALWAYS BACKUP THE DATA

BUT WHEN I DO I BACKUP IT IN-MEMORY

Page 25: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

GreenPrimary

GreenBackup

GreenShard Dat

a

Page 26: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Can I haz some code?

Page 27: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

27

Problem

• Lambda serialization

Page 28: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

28

Page 29: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

29

Solution

• serializable version of the interfaces

• Introducing DistributedStream

Page 30: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

30

Page 31: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Can I haz some code?

Page 32: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

32

Jet Streams

Page 33: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
Page 34: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

34

What’s Hazelcast Jet?• General purpose distributed data

processing framework

• Based on Direct Acyclic Graph to model data flow

• Built on top of Hazelcast IMDG

• Comparable to Apache Spark or Apache Flink

Page 35: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
Page 36: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

36

DAG

Page 37: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

37

Job Execution

Page 38: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
Page 39: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Future (It’s bright!)• Memory module for processing big data

• Higher level streaming and batching APIs

• Reactive Streams

• Distributed Classloading

• Integrations (HDFS/Yarn/Mesos)

Page 40: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Your fuel, our Jet Engine• Public release – Feb 7th.

• Developer Preview today - yay!

• http://hazelcast.org/jet-signup

• Send me a note [email protected]

• Follow @hazelcast and @gamussa (duh!!)

• Your questions #hazelcast #hazelcastjet

Page 41: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

Conclusion• Java Stream API provides very white range of

data processing tools

• War And Piece – is a Big (a lot of data) Book!

• Now we’re pretty sure that Andrew and Pierre are the main characters

Page 42: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017

#nyjavasig #hazelcastjet#java8

@gamussa

http://bit.ly/jet-streams-code

http://hazelcast.org/jet-signup