apache storm internals

Post on 07-Aug-2015

109 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

STORM ANATOMY

Cloud Computing Course Prof Hanku Lee

Social Media Cloud Computing lab MS Akhmedov Khumoyun

What is Stream processing

Stream processing is a technical paradigm to process big volume of unbound sequence of tuples in realtime

= stream

Source Stream Processor

• Continuous analytics• Online machine

learning• Sensor data monitoring• Financial trading …

Storm at Twitter

Twitter Web Analytics

What is Storm?

Storm is

• Fast & scalable• Fault-tolerant• Guarantees messages will be processed• Easy to setup & operate• Free & open source

distributed realtime computation system- Originally developed by Nathan Marz at BackType (acquired by Twitter)- Written in Java and Clojure

Conceptual View

Physical View

Concepts

Streams Spouts Bolts Topologies

Streams

Unbounded sequence of tuples

Spouts

Source of streams

• Read from Kafka queue• Read from Twitter Streaming API

Bolts

Processes input streams and produces new streams

Bolts

• Functions• Filters• Aggregation• Joins• Talk to databases

Topology

Network of spouts and bolts

TasksSpouts and bolts execute as

many tasks across the cluster

Stream grouping

When a tuple is emitted, which task does it go to?

Stream grouping

• Shuffle grouping: pick a random task

• Fields grouping: consistent hashing on a

subset of tuple fields

• All grouping: send to all tasks

• Global grouping: pick task with lowest id

Starting topology

Starting topology

Storm : Fault-tolerance

Storm : Fault-tolerance

Storm : Fault-tolerance

Storm : Fault-tolerance

Storm : Fault-tolerance

Guarantees messages will be processed

Message Passing (ZeroMQ)

Easy to setup & operate

• Setup ZooKeeper cluster• Install dependencies on Nimbus and workermachines- ZeroMQ 2.1.7 and JZMQ- Java 6 and Python 2.6.6- unzip• Download and extract a Storm release to Nimbusand worker machines• Fill in mandatory configuration into storm.yaml• Launch daemons under supervision using “storm”script

Cluster Summary

Topology Summary

Component Summary

Advanced Topics

• Distributed RPC

• Transactional topologies

• Trident

• Using non-JVM languages with Storm

• Unit testing

• Patterns

Real-time Twitter AnalyticsTrending Topics and Sentiment Analysis

Twitter

MySQL

Kafka

Storm Cluster

Hadoop (HDFS and HBase )

Twitter Crawler

THANK YOU FOR ATTENTION

Any Questions Are Welcome…

top related