an architect's guide to real time big data systems
DESCRIPTION
Introduction to real time big data, stream computing using Infosphere Streams and Apache Storm. Presented in a Big Data Conference in Singapore, Jul 2014.TRANSCRIPT
An Architect's Guide to Building Real Time Big Data Systems
Raja SP
10 July 2014, Singapore
Lead Architect & Head of Products
< Real Time > Big Data
WHY WHAT HOW
< Real Time > Big Data
WHY WHAT HOW
What is the right time to shoot me ?
There is a rhythm in the universe
Telecom Marketing Scenario
Cell Utilisation is Low In a Geo-Fence High Balance Frequent Visitor High Data User in the Past
What is out there?
Square Kilometers of Arrays Tens of Thousands of Antennae Terabits of Data
Security / Intelligence
< Real Time > Big Data
WHY WHAT HOW
Partitioned Parallel Processing
TASK
TASK
TASK
DATA i
DATA j
DATA k
Pipelined Parallel Processing
DATA TASK i TASK j TASK k
TASKDATA
Hybrid Parallel Processing
DATA TASK i
TASKj
TASK mTASK k
TASK l
TASKDATA
Should Data go to Tasks?
Or
Tasks go to Data?
DATATASK TASK TASK TASK TASK TASK
Static Data / Data at Rest
DATA DATA DATA TASK DATA DATA DATA
Streaming Data / Data in Motion
Streaming Data / Data in Motion Analytics
The classic “Word Count” (Stream Computing Version)
Counter
CounterJava Python
Lisp
Python Java C++
Counter
Java
Python Python
Java
Lisp
C++
Java 2 Lisp 1
C++ 1
Python 2
Token Splitte
r
Sink
Stream Computing Programming Constructs
Stream Tuple
Operator / Bolt
Counter
CounterJava Python
Lisp
Python Java C++
Counter
Java
Python Python
Java
Lisp
C++
Java 2 Lisp 1
C++ 1
Python 2
Token Splitte
r
Sink
Operator
Source Operator
Sink Operator
IBM Infosphere Streams Apache Storms
Bolt
Spout
-------
Composite Topology
Composite WordCountApp { Graph Stream< rstring sentence > Sentence = FileSource() {} Stream< rstring word > Word = Split( Sentence ) {} Stream< rstring word, int count > Counts = Count( Word ) {}}
Source Split Count
IBM Infosphere Streams
Sentence Word Counts
Apache Storms
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout( ”Source", new RandomSentenceSpout(), 5 );
builder.setBolt( ”Split", new SplitSentence(), 8).shuffleGrouping( "Source” );
builder.setBolt( ”Count", new WordCount(), 12).fieldsGrouping( ”Split", new Fields( "word” ));
Source Split Count
IBM Infosphere Streams – Some Operators
Functor Perform tuple-level manipulations (~250 functions)
Filter Remove some tuples from a stream
Aggregate Group and summarize incoming tuples
Sort Impose an order on incoming tuples in a stream
Join Correlate two streams
Punctor Insert window punctuation markers into a stream
IBM Infosphere Streams – Some Operators (continued)
Barrier Synchronize tuples from sequence-correlated streams
Pair Group tuples from multiple streams of same type
Split Forward tuples to output streams based on a predicate
ThreadedSplit Distribute tuples over output streams by availability
Union Construct an output tuple from each input tuple
DeDuplicate Suppress duplicate tuples seen within a given time period
DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
Stream Window
Aggregate
Sort
Join
< Real Time > Big Data
WHY WHAT HOW
Streams Application Development Method
Apache Storms
RunTime Components
IBM Infosphere Streams
Instance
Management Host
Application HostNimbus ZooKeeper
Node 1
Node 2
Node 3
Cluster
Apache Storms
Application Deployment Units
Instance
Management Host
Application Host 1
Processing
Element 1
Processing
Element 2
Cluster
Management Node (Nimbus)
Node 1
Worker 1 Worker 2
Executor
IBM Infosphere Streams
Executor
Executor
ZooKeeper Node
High Availability & Adaptability
Optimizing scheduler assigns jobs to nodes, and continually manages resource allocation
Apache StormsIBM Infosphere Streams
High Availability & Adaptability
Apache StormsIBM Infosphere Streams
Dynamically add Nodes and Jobs
High Availability & Adaptability
Apache StormsIBM Infosphere Streams
Execution Units on Failed Nodes can be moved automatically with communications re-routed
Topic:
Organized byUNICOM Trainings & Seminars Pvt. Ltd.
Speaker name: Raja SPEmail ID: [email protected]
Thank You