distributed computing in hazelcast - geekout 2014 edition

38
DISTRIBUTED COMPUTING IN HAZELCAST Source: http://www.newscientist.com/gallery/dn17805-computer-museums-of-the-world/11 www.hazelcast.com

Upload: christoph-engelbert

Post on 10-May-2015

935 views

Category:

Engineering


2 download

DESCRIPTION

Today’s amounts of collected data are showing a nearly exponential growth. More than 75% of all the data have been collected in the past 5 years. To store this data and process it in an appropriate time you need to partition the data and parallelize the processing of reports and analytics. This talk will demonstrate how to parallelize data processing using Hazelcast and it’s underlying distributed data structures. With a quick introduction into the different terms and some short live coding examples we will make the journey into the distributed computing. Sourcecode of the demonstrations are available here: 1. https://github.com/noctarius/hazelcast-mapreduce-presentation 2. https://github.com/noctarius/hazelcast-distributed-computing

TRANSCRIPT

Page 1: Distributed Computing in Hazelcast - Geekout 2014 Edition

DISTRIBUTED COMPUTINGIN HAZELCAST

Source: http://www.newscientist.com/gallery/dn17805-computer-museums-of-the-world/11

www.hazelcast.com

Page 2: Distributed Computing in Hazelcast - Geekout 2014 Edition

THAT'S MEChristoph EngelbertTwitter: @noctarius2k8+ years of Java WeirdonessPerformance, GC, traffic topicsApache CommitterGaming, Travel Management, ...

www.hazelcast.com

Page 3: Distributed Computing in Hazelcast - Geekout 2014 Edition

OUR SPACE TRIP ROADMAPHazelcastDistributed ComputingDistributed ExecutorServiceEntryProcessorMap & ReduceQuestions

www.hazelcast.com

Page 4: Distributed Computing in Hazelcast - Geekout 2014 Edition

HAZELCASTPICKIN' DIAMONDS

www.hazelcast.com

Page 5: Distributed Computing in Hazelcast - Geekout 2014 Edition

WHAT IS HAZELCAST?In-Memory Data-GridData Partioning (Sharding)Java Collections ImplementationDistributed Computing Platform

www.hazelcast.com

Page 6: Distributed Computing in Hazelcast - Geekout 2014 Edition

WHY HAZELCAST?Automatic PartitioningFault ToleranceSync / Async BackupsFully DistributedIn-Memory for Highest Speed

www.hazelcast.com

Page 7: Distributed Computing in Hazelcast - Geekout 2014 Edition

WHY HAZELCAST?

www.hazelcast.com

Page 8: Distributed Computing in Hazelcast - Geekout 2014 Edition

www.hazelcast.com

Page 9: Distributed Computing in Hazelcast - Geekout 2014 Edition

WHY IN-MEMORYCOMPUTING?

www.hazelcast.com

Page 10: Distributed Computing in Hazelcast - Geekout 2014 Edition

TREND OF PRICES

Data Source: http://www.jcmit.com/memoryprice.htm

www.hazelcast.com

Page 11: Distributed Computing in Hazelcast - Geekout 2014 Edition

SPEED DIFFERENCE

Data Source: http://i.imgur.com/ykOjTVw.png

www.hazelcast.com

Page 12: Distributed Computing in Hazelcast - Geekout 2014 Edition

DISTRIBUTEDCOMPUTING

OR

MULTICORE CPU ON STEROIDS

www.hazelcast.com

Page 13: Distributed Computing in Hazelcast - Geekout 2014 Edition

THE IDEA OF DISTRIBUTED COMPUTING

Source: https://www.flickr.com/photos/stefan_ledwina/1853508040

www.hazelcast.com

Page 14: Distributed Computing in Hazelcast - Geekout 2014 Edition

THE BEGINNING

Source: http://en.wikipedia.org/wiki/File:KL_Advanced_Micro_Devices_AM9080.jpg

www.hazelcast.com

Page 15: Distributed Computing in Hazelcast - Geekout 2014 Edition

MULTICORE IS NOT NEW

Source: http://en.wikipedia.org/wiki/File:80386with387.JPG

www.hazelcast.com

Page 16: Distributed Computing in Hazelcast - Geekout 2014 Edition

CLUSTER IT

Source: http://rarecpus.com/images2/cpu_cluster.jpg

www.hazelcast.com

Page 17: Distributed Computing in Hazelcast - Geekout 2014 Edition

SUPER COMPUTER

Source: http://www.dkrz.de/about/aufgaben/dkrz-geschichte/rechnerhistorie-1

www.hazelcast.com

Page 18: Distributed Computing in Hazelcast - Geekout 2014 Edition

CLOUD COMPUTING

Source: https://farm6.staticflickr.com/5523/11407118963_e0e0870846_b_d.jpg

www.hazelcast.com

Page 19: Distributed Computing in Hazelcast - Geekout 2014 Edition

DISTRIBUTEDEXECUTORSERVICE

THE WHOLE CLUSTER IN YOUR HANDS

www.hazelcast.com

Page 20: Distributed Computing in Hazelcast - Geekout 2014 Edition

WHY A DISTRIBUTED EXECUTORSERVICE?j.l.Runnable / j.u.c.CallableOnly needs to be serializableSame Task all / multiple NodesShould not work on Data

www.hazelcast.com

Page 21: Distributed Computing in Hazelcast - Geekout 2014 Edition

Print node name on all nodes

QUICK EXAMPLE

Runnable runnable = () -> println("Running on Node: " + member.node);IExecutorService executorService = hazelcastInstance.getExecutorService("default");executorService.executeOnAllMembers(runnable);

www.hazelcast.com

Page 22: Distributed Computing in Hazelcast - Geekout 2014 Edition

DEMONSTRATION

www.hazelcast.com

Page 23: Distributed Computing in Hazelcast - Geekout 2014 Edition

ENTRYPROCESSORLOCKFREE DATA OPERATIONS

www.hazelcast.com

Page 24: Distributed Computing in Hazelcast - Geekout 2014 Edition

WHY ENTRYPROCESSOR?Prevents external LockingGuarantees AtomicityKinda "Cluster-wide Thread-Safe"

www.hazelcast.com

Page 25: Distributed Computing in Hazelcast - Geekout 2014 Edition

Incrementing a counter atomically

QUICK EXAMPLE

private int increment(Map.Entry entry) { val newValue = entry.getValue() + 1; entry.setValue(newValue); return newValue;}

IMap map = hazelcastInstance.getMap("default");int newId = map.executeOnKey("idgen", this::increment);

www.hazelcast.com

Page 26: Distributed Computing in Hazelcast - Geekout 2014 Edition

DEMONSTRATION

www.hazelcast.com

Page 27: Distributed Computing in Hazelcast - Geekout 2014 Edition

MAP & REDUCETHE BLACK MAGIC FROM PLANET GOOGLE

www.hazelcast.com

Page 28: Distributed Computing in Hazelcast - Geekout 2014 Edition

USE CASESLog AnalysisData QueryingAggregation and summingDistributed SortETL (Extract Transform Load)and more...

www.hazelcast.com

Page 29: Distributed Computing in Hazelcast - Geekout 2014 Edition

SIMPLE STEPSReadMap / TransformReduce

www.hazelcast.com

Page 30: Distributed Computing in Hazelcast - Geekout 2014 Edition

FULL STEPSReadMap / TransformCombiningGrouping / ShufflingReduceCollating

www.hazelcast.com

Page 31: Distributed Computing in Hazelcast - Geekout 2014 Edition

MAPREDUCE WORKFLOW

www.hazelcast.com

Page 32: Distributed Computing in Hazelcast - Geekout 2014 Edition

Data are mapped / transformed in a set of key-value pairs

SOME PSEUDO CODE (1/3)

MAPPING

map( key:String, document:String ):Void -> for each w:Word in document: emit( w, 1 )

www.hazelcast.com

Page 33: Distributed Computing in Hazelcast - Geekout 2014 Edition

Multiple values are combined to an intermediate result to preserve traffic

SOME PSEUDO CODE (2/3)

COMBINING

combine( word:Word, counts:List[Int] ):Void -> emit( word, sum( counts ) )

www.hazelcast.com

Page 34: Distributed Computing in Hazelcast - Geekout 2014 Edition

Values are reduced / aggregated to the requested result

SOME PSEUDO CODE (3/3)

REDUCING

reduce( word:String, counts:List[Int] ):Int -> return sum( counts )

www.hazelcast.com

Page 35: Distributed Computing in Hazelcast - Geekout 2014 Edition

FOR MATHEMATICIANSProcess: (K x V)* → (L x W)* ⇒ [(l1, w1), …, (lm, wm)]

Mapping: (K x V) → (L x W)* ⇒ (k, v) → [(l1, w1), …, (ln, wn)]

Reducing: L x W* → X* ⇒ (l, [w1, …, wn]) → [x1, …,xn]

www.hazelcast.com

Page 36: Distributed Computing in Hazelcast - Geekout 2014 Edition

MAPREDUCE PROGRAMS INGOOGLE SOURCE TREE

Source: http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0005.html

www.hazelcast.com

Page 37: Distributed Computing in Hazelcast - Geekout 2014 Edition

DEMONSTRATION

www.hazelcast.com

Page 38: Distributed Computing in Hazelcast - Geekout 2014 Edition

@noctarius2k@hazelcast

http://www.sourceprojects.comhttp://github.com/noctarius

THANK YOU!ANY QUESTIONS?

Images: All images are licensed under Creative Commons

www.hazelcast.com