storm - nodebb · distributed realtime computation system originated at backtype/twitter, open...
TRANSCRIPT
![Page 1: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/1.jpg)
StormHui Li
08/15/2016
![Page 2: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/2.jpg)
► Storm 介绍及特点
► Storm 核心概念
► Storm 系统架构
► Storm 使用
► Storm 应用开发
Storm 大纲
![Page 3: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/3.jpg)
► Storm 介绍及特点
► Storm 核心概念
► Storm 系统架构
► Storm 使用
► Storm 应用开发
Storm 大纲
![Page 4: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/4.jpg)
► Distributed realtime computation system
► Originated at BackType/Twitter, open sourced in late 2011
► Implemented in Clojure, some Java
► Top-level-project, ~141 contributors
Storm 介绍
![Page 5: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/5.jpg)
► Reliable、Guaranteed data processing► At-most-once & At-least-once & Exactly-once(Trident)
► Scalable► Thousands of worker per cluster
► Fault-tolerance► Failure is expected, and embraced
► Fast► clocked at 1M+ messages per second per node
Storm 特点
![Page 6: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/6.jpg)
► Realtime analytics
► Online machine learning
► Continuous computation
► Distributed RPC
► ETL
Storm Use Cases
![Page 7: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/7.jpg)
► Twitter: personalization, search, revenue optimization, …► 200 nodes, 30 topos, 50B msg/day, avg latency <50ms, Jun 2013
► Yahoo: user events, content feeds, and application logs ► 320 nodes (YARN), 130k msg/s, June 2013
► Spotify: recommendation, ads, monitoring, …► v0.8.0, 22 nodes, 15+ topos, 200k msg/s, Mar 2014
► Alibaba, Cisco, Flickr, PARC, WeatherChannel, …► Netflix is looking at Storm and Samza, too.
Storm Adoptions
![Page 8: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/8.jpg)
► Storm 介绍及特点
► Storm 核心概念
► Storm 系统架构
► Storm 使用
► Storm 应用开发
Storm 大纲
![Page 9: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/9.jpg)
► Core Unit of Data
► Immutable Set of Key/Value Pairs
Tuple
![Page 10: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/10.jpg)
► Unbounded Sequence of Tuples
Streams
![Page 11: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/11.jpg)
► Source of Streams
► Wraps a streaming data source and emits Tuples
► Eg: read from Kafka or read from Redis
Spouts
![Page 12: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/12.jpg)
► Processes input streams and produces new streams
► Core functions of a streaming computation
Bolts
![Page 13: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/13.jpg)
► Functions
► Filters
► Aggregation
► Joins
► Talk to databases
Bolts
![Page 14: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/14.jpg)
► Network(DAG) of spouts and bolts
► Data Flow Representation
Topology
![Page 15: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/15.jpg)
► Spouts and bolts execute as many tasks across the cluster
Tasks
![Page 16: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/16.jpg)
► Determine how Storm routes Tuples between tasks in a topology
Stream Grouping
![Page 17: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/17.jpg)
► Shuffle grouping► Randomized round-robin
► Local or shuffle grouping► Randomized round-robin
► With a preference for intra-worker tasks
Stream Grouping
![Page 18: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/18.jpg)
► Fields grouping► Mod hashing on a subset of tuple fields
► Ensures all tuples with the same field values are always routed to the same task.
► Partial Key grouping► Like the Fields grouping, but are load balanced between two
downstream bolts► Provides better utilization of resources for skewed incoming data
Stream Grouping
![Page 19: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/19.jpg)
► All grouping► Send to all tasks
► Global grouping► Pick task with lowest id
► None grouping
► Currently, equivalent to shuffle groupings
Stream Grouping
![Page 20: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/20.jpg)
► Direct grouping► The producer of the tuple decides which task of the
consumer will receive this tuple.
► Direct groupings can only be declared on streams that have been declared as direct streams.
Stream Grouping
![Page 21: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/21.jpg)
► Storm 介绍及特点
► Storm 核心概念
► Storm 系统架构
► Storm 使用
► Storm 应用开发
Storm 大纲
![Page 22: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/22.jpg)
Storm 架构
Topology Nimbus
Zookeeper Zookeeper Zookeeper
Supervisor Supervisor Supervisor
Workers Workers Workers
![Page 23: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/23.jpg)
► Storm 介绍及特点
► Storm 核心概念
► Storm 系统架构
► Storm 使用
► Storm 应用开发
Storm 大纲
![Page 24: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/24.jpg)
► Storm Deployment
► Command Line Client
► REST API
► Storm UI
Storm 使用
![Page 25: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/25.jpg)
► 1. Set up a Zookeeper cluster
► For demo: storm dev-zookeeper
► 2. Install dependencies on Nimbus and worker machines
► Java 7
► Python 2.6.6
► Optional
► Configure PATH & JAVA_HOME environment
Storm Deployment
![Page 26: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/26.jpg)
► 3. Download and extract a Storm release to Nimbus and
worker machines
► sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/
► 4. Fill in mandatory configurations into storm.yaml
► storm.zookeeper.servers
► nimbus.seeds
► supervisor.slots.ports
► storm.local.dir
Storm Deployment
![Page 27: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/27.jpg)
► 5. Launch daemons under supervision using "storm" script
and a supervisor of your choice
► storm nimbus
► storm supervisor
► Optional
► storm ui
► storm drpc
► storm logviewer
Storm Deployment
![Page 28: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/28.jpg)
► storm jar topology-jar-path class ...
► storm list
► storm deactivate topology-name
► storm activate topology-name
► storm rebalance topology-name [-w wait-time-secs] [-n new-num-workers] [-e component=parallelism]*
► storm get-errors topology-name
► storm kill topology-name [-w wait-time-secs]
Command Line Client -- Toplogy Related
![Page 29: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/29.jpg)
► storm nimbus
► storm supervisor
► storm ui
► storm drpc
► storm logviewer
► storm pacemaker
Command Line Client -- Daemon Related
![Page 30: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/30.jpg)
► storm classpath
► storm localconfvalue conf-name
► ~/.storm/storm.yaml + defaults.yaml
► storm remoteconfvalue conf-name
► $STORM-PATH/conf/storm.yaml + defaults.yaml
Command Line Client -- Config Related
![Page 31: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/31.jpg)
► storm monitor topology-name [-i interval-secs] [-m component-id] [-s stream-id] [-w [emitted | transferred]]
► storm set_log_level -l [logger name]=[log level][:optional timeout] -r [logger name] topology-name
► storm shell resourcesdir command args
► storm blobstore cmd► storm blobstore create mytopo:data.tgz -f data.tgz -a u:alice:rwa,u:bob:rw,o::r
► storm sql sql-file topology-name
Command Line Client -- Advanced
![Page 32: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/32.jpg)
► storm help
► storm version
► storm dev-zookeeper
► storm kill_workers
► run on a supervisor node
Command Line Client -- Misc
![Page 33: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/33.jpg)
► Function
► retrieving metrics data
► retrieving configuration information
► management operations
► Supports JSONP
REST API
![Page 34: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/34.jpg)
► Request URL Format
► http://<ui-host>:<ui-port>/api/v1/...
► Default Port: 8080
► Response Format: JSON
REST API
![Page 35: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/35.jpg)
► /api/v1/cluster/configuration (GET)
► /api/v1/cluster/summary (GET)
► /api/v1/nimbus/summary (GET)
► /api/v1/supervisor/summary (GET)
► /api/v1/topology/summary (GET)
► /api/v1/topology/:id (GET)
REST API - GET
![Page 36: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/36.jpg)
► /api/v1/topology/:id/activate (POST)
► /api/v1/topology/:id/deactivate (POST)
► /api/v1/topology/:id/rebalance/:wait-time (POST)
► /api/v1/topology/:id/kill/:wait-time (POST)
REST API - POST
![Page 37: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/37.jpg)
Storm UI
![Page 38: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/38.jpg)
► Storm 介绍及特点
► Storm 核心概念
► Storm 系统架构
► Storm 使用
► Storm 应用开发
Storm 大纲
![Page 39: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/39.jpg)
► API
► WordCount Example
► Parallelism
► Reliablity API
► DRPC
► Trident
► WordCount(Trident version) Example
Storm 应用开发
![Page 40: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/40.jpg)
public interface ISpout extends Serializable {
void open(Map var1, TopologyContext context, SpoutOutputCollector );
void close();
void activate();
void deactivate();
void nextTuple();
void ack(Object var1);
void fail(Object var1);
}
API -- Spout
Lifecycle API
Core API
Reliablity API
• 常见子接口:IRichSpout• 常见实现类:BaseRichSpout, DRPCSpout, RandomSentenceSpout, KafkaSpout
![Page 41: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/41.jpg)
public interface IBolt extends Serializable {
void prepare(Map var1, TopologyContext context, OutputCollector collector);
void execute(Tuple var1);
void cleanup();
}
API -- Bolt
Lifecycle API
Core API
• 常见子接口:IRichBolt • 常见实现类:BaseRichBolt, ShellBolt, RedisStoreBolt, KafkaBolt, HdfsBolt
![Page 42: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/42.jpg)
public interface IOutputCollector extends IErrorReporter {
List<Integer> emit(String streamId, Collection<Tuple> anchors, List<Object> tuple);
void emitDirect(int taskId, String streamId, Collection<Tuple> anchors, List<Object> tuple);
void ack(Tuple input);
void fail(Tuple input);
void resetTimeout(Tuple input);
}
API -- Bolt Output
Core API
Reliablity API
![Page 43: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/43.jpg)
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 2);
builder.setBolt("split", new SplitSentence(), 2).shuffleGrouping("spout").setNumTasks(4);
builder.setBolt("count", new WordCount(), 6).fieldsGrouping("split", new Fields("word"));
API -- Topology
spout split count
![Page 44: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/44.jpg)
常见配置
• Config.TOPOLOGY_WORKERS
• Config.TOPOLOGY_ACKER_EXECUTORS
• Config.TOPOLOGY_MAX_SPOUT_PENDING
• Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS
• Config.TOPOLOGY_SERIALIZATIONS
API -- Topology Configuration
Config conf = new Config();
conf.setNumWorkers(20);
conf.setMaxSpoutPending(5000);
![Page 45: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/45.jpg)
► Local Mode
API - Topology Submission
► Remote Mode
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());
...
cluster.shutdown();
StormSubmitter.submitTopologyWithProgressBar("word-count", conf, builder.createTopology());
![Page 46: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/46.jpg)
WordCount Example
snow white and the seven dwarfssnow
white
and
the
seven
dwarfs
seven: 11snow: 11
and: 23dwarfs: 11
the: 19white: 11
Fields GroupingShuffle Grouping
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 2);
builder.setBolt("split", new SplitSentence(), 2).shuffleGrouping("spout").setNumTasks(4);
builder.setBolt("count", new WordCount(), 6).fieldsGrouping("split", new Fields("word"));
![Page 47: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/47.jpg)
public class RandomSentenceSpout extends BaseRichSpout { SpoutOutputCollector _collector; Random _rand; public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _rand = new Random(); } public void nextTuple() { Utils.sleep(100); String[] sentences = new String[]{ "the cow jumped over the moon", "an apple a day keeps the doctor away",
"four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature" }; String sentence = sentences[_rand.nextInt(sentences.length)]; _collector.emit(new Values(sentence)); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("sentence")); }}
WordCount Example -- Spout
![Page 48: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/48.jpg)
public static class SplitSentence extends BaseBasicBolt { public void execute(Tuple tuple, BasicOutputCollector collector) { String sentence = tuple.getString(0); for (String word : sentence.split(" ")) { collector.emit(new Values(word)); } } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); }}
WordCount Example -- Split Bolt
![Page 49: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/49.jpg)
public static class WordCount extends BaseBasicBolt { Map<String, Integer> counts = new HashMap<String, Integer>(); public void execute(Tuple tuple, BasicOutputCollector collector) { String word = tuple.getString(0); Integer count = counts.get(word); if (count == null) count = 0; count++; counts.put(word, count); collector.emit(new Values(word, count)); System.out.println(String.format("== %s, %d ==", word, count)); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); }}
WordCount Example -- Count Bolt
![Page 50: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/50.jpg)
builder.setSpout("spout", new RandomSentenceSpout(), 2);
builder.setBolt("split", new SplitSentence(), 2).shuffleGrouping("spout").setNumTasks(4);
builder.setBolt("count", new WordCount(), 6).fieldsGrouping("split", new Fields("word"));
conf.setNumWorkers(2);
Parallelism
Parallelism Hint & Task Number & Worker Number ?
![Page 51: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/51.jpg)
► Worker processes
► Executors (threads)
► Tasks
Parallelism
![Page 52: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/52.jpg)
Parallelism
► Rebalance► storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
![Page 53: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/53.jpg)
Reliablity API -- "fully processed"
![Page 54: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/54.jpg)
► To create a new link in the tree of tuples
Reliablity API -- "anchoring"
List<Tuple> anchors = new ArrayList<Tuple>();
anchors.add(A);
_collector.emit(anchors, new Values(B));
![Page 55: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/55.jpg)
Reliablity API -- Acknowledgment
ACK
Fail
ACK Bolt
ACK Bolt
ackack
failfail
public interface ISpout extends Serializable{
void ack(Object var1);
void fail(Object var1);
}
public interface IOutputCollector extends IErrorReporter {
void ack(Tuple input);
void fail(Tuple input);
}
BaseBasicBolt
![Page 56: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/56.jpg)
► Use single 64-bit integer
► XOR MagicLong a, b = Random.nextLong();a != 0a ^ a ^b != 0a ^ a ^ b ^ b == 0
► Question► What will happen if a tuple isn't acked because the task died?
Reliablity API -- Track Tuple Tree
![Page 57: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/57.jpg)
DRPC
![Page 58: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/58.jpg)
► DRPC Server
storm drpc
► DRPC Client
DRPCClient client = new DRPCClient(conf, host, 3772);
String result = client.execute("wc", word);
► DRPC Topology
LinearDRPCTopologyBuilder
DRPC
![Page 59: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/59.jpg)
► Provides consistent, exactly-once semantics
► Micro-Batch Oriented
► Fluent, Stream-Oriented API► Functions
► Filters
► Groupings
► Aggregations
► Merges and Joins
► Stateful, incremental processing on top of any persistence store
Trident
![Page 60: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/60.jpg)
TridentBatch #1Batch #2
![Page 61: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/61.jpg)
TridentTopology topology = new TridentTopology();TridentState wordCounts = topology.newStream("spout1", spout).parallelismHint(16)
.each(new Fields("sentence"),new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count"))
.parallelismHint(16);topology.newDRPCStream("words", drpc)
.each(new Fields("args"), new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.stateQuery(wordCounts, new Fields("word"), new MapGet(), new Fields("count"))
.each(new Fields("count"), new FilterNull())
.aggregate(new Fields("count"), new Sum(), new Fields("sum"));
WordCount(Trident version) Example
![Page 62: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/62.jpg)
► Storm 基本概念、系统架构、基本使用及应用开发入门
► Advanced
► State Management & Statefule Bolts
► Native Streaming Window API
► Distributed Cache API
► Scheduler & Resource Aware Scheduler
► Worker Execution Model
► ...
总结
![Page 63: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/63.jpg)
关注我们
QingCloud-IaaS
青云QingCloud
www.qingcloud.com
![Page 64: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in](https://reader034.vdocuments.mx/reader034/viewer/2022042913/5f496e4c968e774e442faf2d/html5/thumbnails/64.jpg)
Thank [email protected]