Simplifying Big Data Analytics:Unifying Batch and Stream ProcessingJohn Fanelli, !VP Product!In-Memory Compute Summit!June 30, 2015!!
© 2015 DataTorrent Confidential – Do Not Distribute
S S S
B B B
D G GG D D D
Streaming Analy.cs
General-‐purpose data processing cluster
Scale-‐up Database
Data And Compute Grid
Clustered Database
DataTorrent enables enterprises to process data in motion and take action in real-time !
Trend: Batch and streaming use cases!
Faster Time to Insight!Faster Time to Action!
© 2015 DataTorrent Confidential – Do Not Distribute
Data Processing Categories in Big Data Use Cases!
Known Unknown
Ques.ons known?
Data
Velocity
Sta.c
Streaming
Batch Processing
Stream Processing
Ad-‐hoc Query
N/A
© 2015 DataTorrent Confidential – Do Not Distribute
Transactional Data!Web Click Stream!Mobile Devices!Operational Log Files!Public Data!Sensor Data!!
Data Sources!
© 2015 DataTorrent Confidential – Do Not Distribute
Real-Time Advertising!Customer Service !Operational !Fraud Detection!Predictive Maintenance!
Customer Uses!
© 2015 DataTorrent Confidential – Do Not Distribute
Processing Data In Motion!
Ingest !Archive !
Transform Normalize !
Analyze Business Logic!
Alert !Action !
Visualize !Persist !
© 2015 DataTorrent Confidential – Do Not Distribute
Online advertising dynamic inventory purchases!
High volume auto-scaling fault tolerant event stream.
Dimensional computing to identify performing ads.!Ad Server 1!
Ad Server 800!
Real-time Dashboard!
Ad Placement Strategy!
Oracle DB!
Fault-Tolerant Flume!
In-memory analytic cube!
Campaign Analysis!
© 2015 DataTorrent Confidential – Do Not Distribute
SmartGrid Connected Home!
Smart Grid provider with many partners has heterogeneous network sources, provides analytics to utilities
& customers and provide ISV platform!
SmartGrid Sensors!
Home Sensor(s)!
Enrichment Data!
Consumer Energy Audit!
ISV Applications!
Operational Safety/Costs!
Normalization! Analytics! Alert on error!
Tableau!Visualizations!
© 2015 DataTorrent Confidential – Do Not Distribute
Batch Data!
Customer Information!
Historical Sales!
Support Data!
Product Configuration!
Corporate Info!
© 2015 DataTorrent Confidential – Do Not Distribute
Batch Processing Data!
Ingest !Archive !
Transform Normalize !
Visualize !Persist !
Analyze Business Logic!
Alert !Action !
© 2015 DataTorrent Confidential – Do Not Distribute
The Enterprise Data Processing Problem!
ETL!
Business Analytics!
ETL!
Complex Event/!Event Streaming!
BI & Analytics! Platform!
Ingest !Archive !
Transform Normalize !
Analyze Business Logic!
Alert !Action !
Visualize !Persist !
© 2015 DataTorrent Confidential – Do Not Distribute
The Enterprise Data Processing Problem!
ETL!
Business Analytics!
ETL!
Complex Event/!Event Streaming!
BI & Analytics! Platform!
Ingest !Archive !
Transform Normalize !
Analyze Business Logic!
Alert !Action !
Visualize !Persist !
© 2015 DataTorrent Confidential – Do Not Distribute
The Enterprise Data Processing Problem!
ETL!
Business Analytics!
ETL!
Complex Event/!Event Streaming!
BI & Analytics! Platform!
Ingest !Archive !
Transform Normalize !
Analyze Business Logic!
Alert !Action !
Visualize !Persist !
Transformation team - Parse, Dedup, Transform, Encrypt !
Transmission team - Credit, Debit, ACH over Secure FTP !
Distribution team - Hadoop, MPP, DB !
Reports team – !Dashboards & Alerts!
● Separate applications for each step in the end to end process.!o 4 to 5 batch jobs to complete the process end to end!o 1 to 2 runs a day. So typical time to value is around 12 hours!
© 2015 DataTorrent Confidential – Do Not Distribute
Financial services big data fabric!
Secure, fault tolerant, data ingestion, formatting & archiving. Data access layer for application
processing!
Financial Data!
SMTP Logs!
Historical!
Application n!
Application 1!
Persistent!
Encrypt! Compliance! Alert on error!
Archive!
© 2015 DataTorrent Confidential – Do Not Distribute
Satellite Television Provider!
Automated, faster time to insight, driving accurate payment,
auditing and business planning!
Audit Reporting!Rate Rules!
Package Data!
Subscription!
Payment System!
Join! Data Prep! Data Compliance!
Dated Archive!
Rate Rules!
Package Data!
Subscription!Business Projection!
Dated Archive!
© 2015 DataTorrent Confidential – Do Not Distribute
Ingestion & Distribution for Hadoop! Graphical Application Assembly! Real-Time Data Visualization!
Re-Usable Java Operator Library!
Scalable, High Performance, Fault Tolerant In-Memory Data Processing Platform!
Hadoop 2.0 —YARN + HDFS!
Physical Virtual Cloud!
Ingest !Archive !
Transform Normalize !
Analyze Business Logic!
Alert !Action !
Visualize !Persist !
Man
agem
ent &
Mon
itorin
g !
DataTorrent RTS Architectural Overview!
Re-Usable Java Operator Library!
Scalable, High Performance, Fault Tolerant In-Memory Data Processing Platform!
© 2015 DataTorrent Confidential – Do Not Distribute
DataTorrent - Project Apex!• Industry’s first open source enterprise-class unified stream and !
batch processing platform!• DataTorrent RTS 3 Core engine!• Key features requested by open source developer community!ᵒ Event processing guarantees!ᵒ In-memory performance & scalability!ᵒ Fault tolerance and state management!ᵒ Native rolling and static window support!ᵒ Hadoop-native YARN & HDFS implementation!
• Apache 2.0 License!• Complemented by open source Malhar operator library!
https://www.datatorrent.com/product/project-apex/!
DataTorrent enables enterprises to process data in motion and take action in real-time !
Trend: Batch and streaming use cases!
Faster Time to Insight!Faster Time to Action!