distributed computing in hazelcast - geekout 2014 edition
DESCRIPTION
Today’s amounts of collected data are showing a nearly exponential growth. More than 75% of all the data have been collected in the past 5 years. To store this data and process it in an appropriate time you need to partition the data and parallelize the processing of reports and analytics. This talk will demonstrate how to parallelize data processing using Hazelcast and it’s underlying distributed data structures. With a quick introduction into the different terms and some short live coding examples we will make the journey into the distributed computing. Sourcecode of the demonstrations are available here: 1. https://github.com/noctarius/hazelcast-mapreduce-presentation 2. https://github.com/noctarius/hazelcast-distributed-computingTRANSCRIPT
DISTRIBUTED COMPUTINGIN HAZELCAST
Source: http://www.newscientist.com/gallery/dn17805-computer-museums-of-the-world/11
www.hazelcast.com
THAT'S MEChristoph EngelbertTwitter: @noctarius2k8+ years of Java WeirdonessPerformance, GC, traffic topicsApache CommitterGaming, Travel Management, ...
www.hazelcast.com
OUR SPACE TRIP ROADMAPHazelcastDistributed ComputingDistributed ExecutorServiceEntryProcessorMap & ReduceQuestions
www.hazelcast.com
HAZELCASTPICKIN' DIAMONDS
www.hazelcast.com
WHAT IS HAZELCAST?In-Memory Data-GridData Partioning (Sharding)Java Collections ImplementationDistributed Computing Platform
www.hazelcast.com
WHY HAZELCAST?Automatic PartitioningFault ToleranceSync / Async BackupsFully DistributedIn-Memory for Highest Speed
www.hazelcast.com
WHY HAZELCAST?
www.hazelcast.com
www.hazelcast.com
WHY IN-MEMORYCOMPUTING?
www.hazelcast.com
TREND OF PRICES
Data Source: http://www.jcmit.com/memoryprice.htm
www.hazelcast.com
SPEED DIFFERENCE
Data Source: http://i.imgur.com/ykOjTVw.png
www.hazelcast.com
DISTRIBUTEDCOMPUTING
OR
MULTICORE CPU ON STEROIDS
www.hazelcast.com
THE IDEA OF DISTRIBUTED COMPUTING
Source: https://www.flickr.com/photos/stefan_ledwina/1853508040
www.hazelcast.com
THE BEGINNING
Source: http://en.wikipedia.org/wiki/File:KL_Advanced_Micro_Devices_AM9080.jpg
www.hazelcast.com
MULTICORE IS NOT NEW
Source: http://en.wikipedia.org/wiki/File:80386with387.JPG
www.hazelcast.com
CLUSTER IT
Source: http://rarecpus.com/images2/cpu_cluster.jpg
www.hazelcast.com
SUPER COMPUTER
Source: http://www.dkrz.de/about/aufgaben/dkrz-geschichte/rechnerhistorie-1
www.hazelcast.com
CLOUD COMPUTING
Source: https://farm6.staticflickr.com/5523/11407118963_e0e0870846_b_d.jpg
www.hazelcast.com
DISTRIBUTEDEXECUTORSERVICE
THE WHOLE CLUSTER IN YOUR HANDS
www.hazelcast.com
WHY A DISTRIBUTED EXECUTORSERVICE?j.l.Runnable / j.u.c.CallableOnly needs to be serializableSame Task all / multiple NodesShould not work on Data
www.hazelcast.com
Print node name on all nodes
QUICK EXAMPLE
Runnable runnable = () -> println("Running on Node: " + member.node);IExecutorService executorService = hazelcastInstance.getExecutorService("default");executorService.executeOnAllMembers(runnable);
www.hazelcast.com
DEMONSTRATION
www.hazelcast.com
ENTRYPROCESSORLOCKFREE DATA OPERATIONS
www.hazelcast.com
WHY ENTRYPROCESSOR?Prevents external LockingGuarantees AtomicityKinda "Cluster-wide Thread-Safe"
www.hazelcast.com
Incrementing a counter atomically
QUICK EXAMPLE
private int increment(Map.Entry entry) { val newValue = entry.getValue() + 1; entry.setValue(newValue); return newValue;}
IMap map = hazelcastInstance.getMap("default");int newId = map.executeOnKey("idgen", this::increment);
www.hazelcast.com
DEMONSTRATION
www.hazelcast.com
MAP & REDUCETHE BLACK MAGIC FROM PLANET GOOGLE
www.hazelcast.com
USE CASESLog AnalysisData QueryingAggregation and summingDistributed SortETL (Extract Transform Load)and more...
www.hazelcast.com
SIMPLE STEPSReadMap / TransformReduce
www.hazelcast.com
FULL STEPSReadMap / TransformCombiningGrouping / ShufflingReduceCollating
www.hazelcast.com
MAPREDUCE WORKFLOW
www.hazelcast.com
Data are mapped / transformed in a set of key-value pairs
SOME PSEUDO CODE (1/3)
MAPPING
map( key:String, document:String ):Void -> for each w:Word in document: emit( w, 1 )
www.hazelcast.com
Multiple values are combined to an intermediate result to preserve traffic
SOME PSEUDO CODE (2/3)
COMBINING
combine( word:Word, counts:List[Int] ):Void -> emit( word, sum( counts ) )
www.hazelcast.com
Values are reduced / aggregated to the requested result
SOME PSEUDO CODE (3/3)
REDUCING
reduce( word:String, counts:List[Int] ):Int -> return sum( counts )
www.hazelcast.com
FOR MATHEMATICIANSProcess: (K x V)* → (L x W)* ⇒ [(l1, w1), …, (lm, wm)]
Mapping: (K x V) → (L x W)* ⇒ (k, v) → [(l1, w1), …, (ln, wn)]
Reducing: L x W* → X* ⇒ (l, [w1, …, wn]) → [x1, …,xn]
www.hazelcast.com
MAPREDUCE PROGRAMS INGOOGLE SOURCE TREE
Source: http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0005.html
www.hazelcast.com
DEMONSTRATION
www.hazelcast.com
@noctarius2k@hazelcast
http://www.sourceprojects.comhttp://github.com/noctarius
THANK YOU!ANY QUESTIONS?
Images: All images are licensed under Creative Commons
www.hazelcast.com