apache apex meetup at cask
Post on 12-Jan-2017
99 Views
Preview:
TRANSCRIPT
Thomas Weise <thomas@datatorrent.com>Dec 2nd, 2015
Introduction to Open Source Unified Streaming and Fast Batching PlatformApache Apex
© 2015 DataTorrent2
Apex Platform Overview
© 2015 DataTorrent3
Apache Malhar Library
© 2015 DataTorrent4
Native Hadoop Integration
• YARN is the resource manager
• HDFS used for storing any persistent state
© 2015 DataTorrent5
Application Programming Model
A Stream is a sequence of data tuplesAn Operator takes one or more input streams, performs computations & emits one or more output streams
• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library• Operator has many instances that run in parallel and each instance in single-threaded
Directed Acyclic Graph (DAG) is made up of operations and streams
Directed Acyclic Graph (DAG)
Filtered Stream
Output StreamTuple Tuple
Filtered Stream
Enriched Stream
Enriched
Stream
er
Operator
er
Operator
er
Operator
er
Operator
© 2015 DataTorrent6
Application Specification
© 2015 DataTorrent7
Partitioning and Scaling Out
• Operators can be dynamically scaled• Flexible Streams split• Parallel partitioning
• MxN partitioning • Unifiers
© 2015 DataTorrent8
Advanced Windowing Support
Application window Sliding window and tumbling window
Checkpoint window No artificial latency
© 2015 DataTorrent9
Platform FeaturesStateful Fault Tolerance Processing Semantics Data Locality
Supported out of the box– Application state– Application master state– No data loss
Automatic recovery Lunch test Buffer server
At least once At most once Exactly once
Stream locality for placement of operators
Rack local – Distributed deployment
Node local – Data does not traverse NIC
Container local – Data doesn’t need to be serialized
Thread local – Operators run in same thread
Data locality
© 2015 DataTorrent10
Dynamic Updates Dynamic topology updates
– Properties of operators can be changed– New operators can be added
© 2015 DataTorrent11
Data Processing Pipeline ExampleApp Builder
© 2015 DataTorrent12
Data Processing Pipeline ExampleLogical Plan
© 2015 DataTorrent13
Data Processing Pipeline ExamplePhysical Plan
© 2015 DataTorrent14
Data Processing Pipeline ExampleReal Time Visualization
© 2015 DataTorrent15
ResourcesApache Apex Community Page
Apache Apex LinkedIn Group
© 2015 DataTorrent
Resources
16
• Apache Apex - http://apex.apache.org/• Subscribe - http://apex.apache.org/community.html• Download - https://www.datatorrent.com/download/• Twitter
ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent
• Meetups - http://www.meetup.com/topics/apache-apex• Webinars - https://www.datatorrent.com/webinars/• Videos - https://www.youtube.com/user/DataTorrent• Slides - http://www.slideshare.net/DataTorrent/presentations • Startup Accelerator Program - Full featured enterprise product
ᵒ https://www.datatorrent.com/product/startup-accelerator/
© 2015 DataTorrent
We Are Hiring
17
• jobs@datatorrent.com• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders
End
18
top related