apache phoenix with actor model (akka.io) for real-time big data programming stack
DESCRIPTION
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack Why we still need SQL for Big Data ? How to make Big Data more responsive and faster ?TRANSCRIPT
Apache Phoenix with Actor Model (Akka.io) for Real-time Big Data Programming Stack
Why we still need SQL for Big Data ?How to make Big Data more responsive and faster ?
By http://nguyentantrieu.infoTech Lead at eClick team - FPT Online
Contents
1. What is Big data and Why ?2. When standard relational database (Oracle,MySQL, ...) is
not good enough3. Common problems in big data system4. Introducing open-source tools in Big Data System
a. Apache Phoenix for ad-hoc queryb. Actor Model and Akka.io for reactive data processing
What Does Big Data Actually Mean?
“Big data means data that cannot fit easily into a standard relational database.”
Hal Varian- Chief Economist, Googlehttp://www.brookings.edu/blogs/techtank/posts/2014/09/11-big-data-definition
When standard relational database (Oracle,MySQL, ...) is not good enough
the “analytic system” MySQL database from a startup, tracking all actions in mobile games: iOS, Android, ...
Complex analytic system and the “scale” pain
Definition from the crowd
“Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.”
Jonathan Stuart Ward and Adam BarkerSource:http://arxiv.org/abs/1309.5821http://www.technologyreview.com/view/519851/the-big-data-conundrum-how-to-define-it/
“Chaotic” fact and the demand
80% of that data is unstructured or “chaotic”Photos, videos and social media posts - data that says so much about us - but cannot be analyzed via traditional methods
Demand:
“Finding order among chaos”
3 common problems in Big Data System
1. Size: the volume of the datasets is a critical factor.
2. Complexity: the structure, behaviour and permutations of the datasets is a critical factor.
3. Technologies: the tools and techniques which are used to process a sizable or complex dataset is a critical factor.
Introducing open-source tools in Big Data System
Apache Phoenix
as SQL ad-hoc query engine
Actor Model as nano-service for reactive data computation
in the dawn of “Fast data”
Some innovative tools were born in the dawn of Big Data Age
But could an elephant fly without wings ?
But a phoenix can fly !
What is Apache Phoenix ?
Apache Phoenix is a SQL skin over HBase. It means scaling Phoenix just like scale-up and scale-out the Hbase
PhoenixSQL Engine
Interesting features of Apache Phoenix ● Embedded JDBC driver implements the majority of java.sql interfaces,
including the metadata APIs.● Allows columns to be modeled as a multi-part row key or key/value cells.● Full query support with predicate push down and optimal scan key
formation.● DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for
adding/removing columns.● Versioned schema repository. Snapshot queries use the schema that was
in place when data was written.● DML support: UPSERT VALUES for row-by-row insertion, UPSERT
SELECT for mass data transfer between the same or different tables, and DELETE for deleting rows.
● Limited transaction support through client-side batching.● Single table only - no joins yet and secondary indexes are a work in
progress.● Follows ANSI SQL standards whenever possible● Requires HBase v 0.94.2 or above ● 100% Java
the Phoenix table schema
Setting JDBC Phoenix Driver
Phoenix and SQL tool in Eclipse 4
Phoenix vs Hive (running over HDFS and HBase)
http://phoenix.apache.org/performance.html
Actor Model in the dawn of “Fast data”
http://youtu.be/TnLiEWglqHk - Google I/O 2014 - The dawn of "Fast Data"
The paper: MillWheel: Fault-Tolerant Stream Processing at Internet Scale
What is actor model ?
● Carl Hewitt defined the Actor Model in 1973 as a mathematical theory that treats “Actors” as the universal primitives of concurrent digital computation.
● A fitting model for heavily-parallel processing in a cloud environment
What actor model ?
is the framework for implementing Actor computation
Inspired by MillWheel of Google and Storm of Twitter, I have developed my own framework, the “Rfx” (Reactive Functor Extension) with Akka as core
The pipeline of finding social trends in real-time analytics
Facebook Social Trending from a website
Quick demo
Using Akka (Rfx) and Apache Phoenix for Social Media Real-time Analytics
Links for self-study and researchActor Model and Programming:● http://nguyentantrieu.info/blog/the-architecture-for-real-time-event-processing-
with-reactive-actor-model● http://www.slideshare.net/drorbr/the-actor-model-towards-better-concurrency● http://www.infoq.com/articles/reactive-cloud-actors● http://www.mc2ads.com/p/rfx-for-big-data-developer.html
Apache Phoenix● http://java.dzone.com/articles/apache-phoenix-sql-driver
● http://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html
Big Data and Data Science● http://www.mc2ads.com and http://www.mc2ads.org● http://datascience101.wordpress.com● http://lambda-architecture.net● http://www.bigdata-startups.com● https://www.coursera.org/course/datasci