This slide blank on purpose
Riding the Distributed Streams
#nyjavasig #hazelcastjet#java8
> whoami• Solutions Architect
@Hazelcast
• Hang out with awesome people
• @gamussa in internetz
Please, follow me in TwitterI’m very interesting ©
Agenda
• Refreshing knowledge on Java 8 Streams
• Distribute and Conquer
• Distributed Data
• Distributed Streams
• How we did all this
Java 8 Streams
Java 8 Streams…• An abstraction represents a sequence of
elements
• Is not a data structure
• Convey elements from a source through a pipeline of operations
• Operation doesn’t modify a source
Why I should care about Stream API?
• You’re Java developer
What does regular Java developer think about Scala?advanced
Why I should care about Stream API?
• You’re Java developer
• Many Java developers know Java
• It’s all about data processing
java.util.stream operations
• map(), flatMap(), filter()
• reduce(), collect()
• sorted()
Intermediate operation
Terminal operation
Blocking operation
Problem
• One does not simply put all Big Data in one machine
Problem
• Data doesn’t fit just one machine
ONE DOES NOT SIMPLY
FIT BIG DATA IN ONE MACHINE
Problem
• One does not simply put all Big Data in one machine
• Data is too important to have it only one machine
EXCUSE ME,
COULD YOU SPARE A MOMENT TO TALK ABOUT
DATA DISTRIBUTION
CACHES
REPLICATION
SHARDING
Replication on Sharding?
http://book.mixu.net/distsys/single-page.html
Solution
• Use Distributed Map aka IMap
What’s Hazelcast IMDG?• In-memory Data Grid• Apache v2 Licensed• Distributed
• Caches (IMap, JCache)• Java Collections (IList, ISet, IQueue)• Messaging (Topic, RingBuffer)• Computation (ExecutorService, M-R)
Scale-Out Computing
Scale-Up Computing
I DON’T ALWAYS BACKUP THE DATA
BUT WHEN I DO I BACKUP IT IN-MEMORY
GreenPrimary
GreenBackup
GreenShard Dat
a
Can I haz some code?
27
Problem
• Lambda serialization
28
29
Solution
• serializable version of the interfaces
• Introducing DistributedStream
30
Can I haz some code?
32
Jet Streams
34
What’s Hazelcast Jet?• General purpose distributed data
processing framework
• Based on Direct Acyclic Graph to model data flow
• Built on top of Hazelcast IMDG
• Comparable to Apache Spark or Apache Flink
36
DAG
37
Job Execution
Future (It’s bright!)• Memory module for processing big data
• Higher level streaming and batching APIs
• Reactive Streams
• Distributed Classloading
• Integrations (HDFS/Yarn/Mesos)
Your fuel, our Jet Engine• Public release – Feb 7th.
• Developer Preview today - yay!
• http://hazelcast.org/jet-signup
• Send me a note [email protected]
• Follow @hazelcast and @gamussa (duh!!)
• Your questions #hazelcast #hazelcastjet
Conclusion• Java Stream API provides very white range of
data processing tools
• War And Piece – is a Big (a lot of data) Book!
• Now we’re pretty sure that Andrew and Pierre are the main characters
#nyjavasig #hazelcastjet#java8
@gamussa
http://bit.ly/jet-streams-code
http://hazelcast.org/jet-signup