![Page 1: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/1.jpg)
Scalable Real-Time Complex Event Processing @Uber
Shuyi Chen Uber Technology Inc.
![Page 2: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/2.jpg)
● 6 continents, 70 countries and 400+ cities● Transportation as reliable as running water, everywhere, for
everyone
Uber
![Page 3: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/3.jpg)
Outline
• Motivation
• Architecture
• Limitations
• Challenges
![Page 4: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/4.jpg)
Outline
• Motivation
• Architecture
• Limitations
• Challenges
![Page 5: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/5.jpg)
Uber is a data-driven company
![Page 6: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/6.jpg)
Thousands of Kafka topics from different services
![Page 7: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/7.jpg)
We can extract a lot of useful information from this rich set of logs in real-time!
![Page 8: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/8.jpg)
Multiple logins from the same IP in the last 10 minutes
![Page 9: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/9.jpg)
Partner accepted a trip → partner calls rider through the Uber APP
→ rider cancels the trip
![Page 10: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/10.jpg)
Partners reject the second pickup of a UberPOOL trip
![Page 11: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/11.jpg)
Multiple logins from the same IP in the last 10 minutes
Window Aggregation
![Page 12: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/12.jpg)
Partner accepted a trip → partner calls rider through the Uber APP
→ rider cancels the trip
Pattern detection
![Page 13: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/13.jpg)
Partners reject the second pickup of a UberPOOL trip
Filter
![Page 14: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/14.jpg)
Can we use declarative semantics to specify these stream processing logics?
![Page 15: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/15.jpg)
Complex event processing
• Combines data from multiple sources to infer events or patterns that
suggest more complicated circumstances
• CEP is used across many industries for various use cases, including:– Finance: Trade analysis, fraud detection
– Airlines: Operations monitoring
– Healthcare: Claims processing, patient monitoring
– Energy and Telecommunications: Outage detection
• CEP uses declarative rule/query language to specify event processing
logic
![Page 16: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/16.jpg)
WSO2/Siddhi: Complex event processing engine
• Lightweight, extensible, open source, released as a Java library• Features supported
– Filter– Join– Aggregation– Group by– Window– Pattern processing– Sequence processing– Event tables– Event-time processing– UDF– Extensions– Declarative query language: SiddhiQL
![Page 17: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/17.jpg)
How Siddhi works
• Specify processing logic declaratively with SiddhiQL
![Page 18: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/18.jpg)
How Siddhi works
• Query is parsed at runtime into an execution plan runtime • As events flow in, the execution plan runtime process events inside
the CEP engine according the query logic
![Page 19: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/19.jpg)
How can we make it scalable at Uber scale?
![Page 20: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/20.jpg)
Apache Samza
• A distributed stream processing framework– Distributed and Scalable– Built-in State management– Built-in fault tolerant– At-least-once message processing
![Page 21: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/21.jpg)
How can we make the stream processing output useful?
![Page 22: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/22.jpg)
Actions
• Generalize a set of common action templates to make it easy for services and human to harness the power of realtime stream processing
• Currently we support– Make an RPC call– Invoke a Webhook endpoint– Index to ElasticSearch– Index to Cassandra– Kafka– Statsd– Chat service– Email– Push notification
![Page 23: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/23.jpg)
Actions
Real-time Scalable Complex Event Processing
![Page 24: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/24.jpg)
Outline
• Motivation
• Architecture
• Limitations
• Challenges
![Page 25: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/25.jpg)
![Page 26: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/26.jpg)
![Page 27: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/27.jpg)
Partitioner
• Re-partition events based on key• Support predicate pushdown through query analysis• Support column pruning through query analysis (WIP)
![Page 28: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/28.jpg)
Query processor
• Parse Siddhi queries into execution plan runtime• Process events in Siddhi execution plan runtime• Checkpoint state regularly to ensure recovery upon crash/restart
using RocksDB
![Page 29: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/29.jpg)
Action processor
• Execute actions upon the complex event processing output• Support various kinds of actions for easy integration• Implement action retry mechanism using RocksDB to provide
at-least-once delivery
![Page 30: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/30.jpg)
How do we translate a query into psychical plan that runs?
![Page 31: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/31.jpg)
DAG (Directed Acyclic Graph) generation
• Analyze Siddhi query to automatically generate the stream processing DAG in Samza using the processors
Filter, transformation
![Page 32: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/32.jpg)
Join, window, pattern
![Page 33: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/33.jpg)
More complicated
![Page 34: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/34.jpg)
No stream processing logic is hard-coded in any of the processors
![Page 35: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/35.jpg)
REST API backend
• All queries, actions are stored externally in database.• RESTFUL API for CRUD operations• If query/action logic changed
– Redeploy the Samza DAG if needed– Otherwise, the updated queries/actions will be loaded at runtime w/o
interruption
![Page 36: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/36.jpg)
Unified management and monitoring
• Every use case – share the same set of processors– Use queries and actions to describe its processing logic
• A single monitoring template can be reused across different use cases
![Page 37: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/37.jpg)
Production status
• 100+ production use cases• 30+ billion messages processed per day
![Page 38: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/38.jpg)
Applications
• Real-time fraud detection• Real-time anomaly detection• Real-time marketing campaign• Real-time promotion• Real-time monitoring• Real-time feedback system• Real-time analytics• Real-time visualizations• And etc.
![Page 39: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/39.jpg)
Outline
• Motivation
• Architecture
• Limitations
• Challenges
![Page 40: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/40.jpg)
Out-of-order event handling
• Not a big concern– Events of the same rider/partner are usually seconds aparts
• K-slack extension in Siddhi for out-of-order event processing
![Page 41: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/41.jpg)
Auto-scaling
• Manually re-partition kafka topics to increase parallelism• Manually tune container memory if needed• Future
– Use CPU/memory/IO stats to auto-scale the data pipelines
![Page 42: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/42.jpg)
Outline
• Motivation
• Architecture
• Limitations
• Challenges
![Page 43: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/43.jpg)
Large checkpointing state
• Samza use Kafka to log state changes• Siddhi engine snapshot can be large• Kafka message size limit to 1MB by default• Solution: we build logics to slice state into smaller pieces and
checkpoint them.
![Page 44: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/44.jpg)
Synchronous checkpointing
• If state is large, time to checkpoint can be long• Samza uses single-threaded model, unsafe to do it asynchronously
(SAMZA-863)
![Page 45: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/45.jpg)
Exactly once state processing?
• Can not commit state and offset atomically• No exactly once state processing
![Page 46: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/46.jpg)
Custom business logic
• Common logic implemented as Siddhi extensions
• Ad-hoc logic implemented as UDF in javascript or scala
![Page 47: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/47.jpg)
Intermediate Kafka messages
• Samza uses Kafka as message queue for intermediate processing output– This can create large load on Kafka if a heave topic is partitioned multiple
times– Encode the intermediate messages to reduce footprint
![Page 48: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/48.jpg)
Multi-tenancy
• Older Siddhi version process events using a thread pool– Bad for multi-tenancy in YARN– Consume more CPU resource than claimed
• Newer version still use thread pool for scheduled task, but main processing in single thread– Good: CPU consumption per YARN container is bounded
![Page 49: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/49.jpg)
Upgrading Samza jobs
• Upgrade Samza jobs require a full restart, and can take minutes due to– Offset checkpointing topic too large → set retention to hours– Changelog topic too large → set retention or enable compaction in
Kafka or host affinity (SAMZA-617)• To minimize the interruption during upgrade, it would be nice to
have– Rolling restart– Per container restart
![Page 50: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/50.jpg)
Our solution: non-interrupted handoff
• For critical jobs, we use replication during upgrade– Start a shadow job – Upgrade shadow– Switch primary and shadow– Upgrade primary– Switch back
• Downside: require 2x capacity during upgrade
![Page 51: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber](https://reader034.vdocuments.mx/reader034/viewer/2022050613/58ad8ac61a28ab662a8b5757/html5/thumbnails/51.jpg)
Thank You!