how to build fast data applications: evaluating the top contenders
TRANSCRIPT
page
HOW TO BUILD FAST DATA APPLICATIONS: EVALUATING THE TOP CONTENDERS
Dheeraj Remella, Director of Solutions Architecture
VoltDB
page© 2016 VoltDB
OUR SPEAKER
2
Dheeraj RemellaDir. of Solutions Architecture, VoltDB
page© 2016 VoltDB
VOLTDB – PURPOSE-BUILT FOR FAST DATA
• What?• Operational database with integrated processing
and data pipeline in a single system
• Why?
• “Streaming apps are really database apps when your database is fast enough”
3
page© 2016 VoltDB
Collect Explore
AnalyzeAct
4
Big Data analytic results:
1. Discoveries: seasonal predictions, scientific results, long-term capacity planning
2. Op.miza.ons: market segmentation, fraud heuristics, optimal customer journey
page© 2016 VoltDB
DATA ARCHITECTURE FOR FAST + BIG DATA
Enterprise Apps
ETL
CRM ERP Etc.
Data Lake (HDFS, etc.)
BIG DATA SQL on Hadoop
Map Reduce
Exploratory Analytics
BI Reporting
Fast Operational Database
FAST DATA
Export Ingest / Interactive
Real-time Analytics
Fast Serve Analytics
Decisioning
5
page© 2016 VoltDB
IN THE BIG CORNER
Systems facilitating exploration and analytics of large collections.
6
Example Technologies Columnar OLAP warehouses Hadoop Ecosystem • MapReduce • Hive, Pig • SQL.next: Impala, Drill, Shark
Example Applications • User segmentation & pre-scoring • Seasonal trending • Recommendation matrices • Building search indexes • Data Science: statistical clustering,
machine learning
page© 2016 VoltDB
IN THE FAST CORNER
Systems facilitating real time ingest, analytics and decisions against incoming streams of events.
7
Example Technologies • Streaming frameworks • Fast OLAP • VoltDB (fast OLTP)
Example Applications • Micro-personalization • Recommendation serving • Alerting/alarming • Operational monitoring • Data enrichment (ETL elimination) • High throughput authorization
• Ex: API quota enforcement
page© 2016 VoltDB
TYPICAL FAST DATA QUESTIONS
8
Hadoop Volume
SQL / OLAP Data Science
Fast Velocity
• Is the fast layer streaming? • It is often more like fast OLTP
• How do the pieces communicate? • OLAP analytics from Big -> Fast • New events from Fast -> Big
• Where do “analytics” belong? • Analytics per-event: with Fast • Analytics across history: with Big
• Are streaming frameworks equivalent? • Traditional SQL CEP (Esper, Streambase) • Tuple DAGs (Storm) • Window processors on Hadoop (Spark)
page© 2016 VoltDB
HOW TO SOLVE IT*
9
* With credit to G. Polya
Considering Data Considering Processing
What are the types of data to be managed in fast data applica>ons?
How does data flow through fast data applica>ons?
What are the calcula>ons & analy>cs that are necessary?
page© 2016 VoltDB
Data Temporality
Incoming events Click stream, tick stream, sensors, metrics
Real-Time Analytic Results
Event metadata Device version, location, user profiles, point-of-interest data
OLAP Analytics Used in Real-Time Decisions
Responses/side effects
10
Examples
Event Stream
Persistent (Queryable)
Persistent (Look-Ups)
Outgoing events
Persistent (Look-Ups)
Event Stream
Event Stream
Counters, streaming aggregates,Time-series rollups
Scoring models, seasonal usage,demographic trends
Policy enforcement decisions,personalization recommendations
Enriched, filtered, correlated transform of input feed
page© 2016 VoltDB
SOURCES OF STATE
1. Analytics outputs must be query-able.
2. “Lookup tables” to create groupings for analytics and to supply enrichment data.
3. Session managements: grouping, filtering and aggregating create intermediate state.
11
page© 2016 VoltDB 12
Considering Data Considering Processing
What are the types of data to be managed in fast data applica>ons?
How does data flow through fast data applica>ons?
What are the calcula>ons & analy>cs that are necessary?
page© 2016 VoltDB
DATA FLOWS
Real-time Analytics • Streaming summaries for operations • KPI measurement • Analytics for apps
13
Real-Time Analytics
page© 2016 VoltDB
DATA FLOWS
14
Fast Request/Response (and side effects) • Mobile Authorization • Campaign Evaluation • Quota Enforcement • Micro-Personalization • Recommendation Serving
Request/ Response
page© 2016 VoltDB
DATA FLOWS
Data Pipelines • Data enrichment • Sessionization and re-assembly of incoming events. • Correlation (by time, location, identity) • Filtering
15
Pipeline Data Lake
page© 2016 VoltDB
1ST GENERATION FAST DATA: STREAMING ANALYTICS
• Examples: Spark Streaming, Storm, Kinesis, Tibco Streambase, et al
• Technical:• Lack “state” for transaction processing (operational)
• Complex programming model
• No ability to do ad hoc queries
• Functional: • 1st Gen only offers streaming analytics• Separate database required for any meaningful work• Proprietary interface is inconsistent with the rest of the data
pipeline• Does not support applications requirement for interaction
1st
Gen
Stre
amin
g
Stream Analytics
Query Predefined
page© 2016 VoltDB
2ND GENERATION FAST DATA: STREAMING ANALYTICS & OPERATIONAL WORK
• Streaming Analytics converges with the operational applications
• Convergence is necessary to use data in real-time
• Automated application interactions are informed by data
• Brings the application into the “data analytics” world
• Streaming Analytics alone is passive, Fast Data is interac.ve 1
st G
en2
nd G
en
Stre
amin
g
Stream Analytics
Query Predefined
Ad hoc
Support Operational
Work
Vol
tDB
page© 2016 VoltDB 18
Considering Data Considering Processing
What are the types of data to be managed in fast data applica>ons?
How does data flow through fast data applica>ons?
What are the calcula>ons & analy>cs that are necessary?
page© 2016 VoltDB 19
Continuous Query Transactional Event Evaluation Transformation
page© 2016 VoltDB
FAST DATA STACK
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
• Counters• Aggregations• Time series• Statistics
• Store results• Query and
recombine• Fast serving
• Per-event policy evaluations• Responses (synchronous):
authorization, personalization• Side-effects (asynchronous): alerts,
alarms
Export & Pipeline
page© 2016 VoltDB 21
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
CountersAggregationsTime seriesStatistics
Store resultsQuery and recombineFast serving
Per-event policy evaluationsResponses (synchronous)Side-effects (asynchronous)
Export & Pipeline
APACHE-ISH TECHNOLOGY STACK
Kafka / RabbitMQ
Storm, Flume, Sqoop
Storm + Serving Layer
Spark +
Serving Layer
Cassandra, HBase
Hadoop, Message queues
page© 2016 VoltDB 22
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
CountersAggregationsTime seriesStatistics
Store resultsQuery and recombineFast serving
Per-event policy evaluationsResponses (synchronous)Side-effects (asynchronous)
Export & Pipeline
VOLTDB TECHNOLOGY STACK
Kafka / RabbitMQ
VoltDB
SQL, Java for Analytics
Transactions / ACID
Hadoop, Message queues
page© 2016 VoltDB 23
OLTP (Transactions First)
Streaming Event Processors
OLAP (Columnar Analytics)
page© 2016 VoltDB 24
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
CountersAggregationsTime seriesStatistics
Store resultsQuery and recombineFast serving
Per-event policy evaluationsResponses (synchronous)Side-effects (asynchronous)
Export & Pipeline
STREAM TECHNOLOGY STACK
page© 2016 VoltDB 25
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
CountersAggregationsTime seriesStatistics
Store resultsQuery and recombineFast serving
Per-event policy evaluationsResponses (synchronous)Side-effects (asynchronous)
Export & Pipeline
OLAP TECHNOLOGY STACK
page© 2016 VoltDB
QUESTIONS?
• Use the chat window to type in your questions
• Try VoltDB yourself:Ø www.voltdb.com/download
26
page© 2016 VoltDB page
THANK YOU!
27