fast data processing with rfx

22
Fast Data Processing with RFX Simplify Fast Data Processing [email protected] http://www.rfxlab.com

Upload: trieu-nguyen

Post on 09-Jan-2017

245 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Fast Data Processing with RFXSimplify Fast Data Processing

[email protected]://www.rfxlab.com

The Big Picture

Demo first

Content at glance

1. BEAM✲ methodology for agile data warehouse2. Introduction to Fast Data 3. Problem “Fast Data in web analytics” 4. Examples for fast data design pattern (RFX or Reactive Function X)

4.1. Event data actor4.2. Event data agent4.3. Event data collector4.4. Event data router4.5. Event data processor4.6. Event data storage4.7. Event data query4.8. Event data reactor

5. Demo “Fast Data in web analytics” with source code explanation

1 - BEAM✲ methodology

1 - BEAM✲ methodology for Agile Data Warehouse

BEAM✲ stands for Business Event Analysis & Modelling, and it’s a methodology for gathering business requirements for Agile Data Warehouses and building those warehouses.

It was developed by Lawrence Corr (@LawrenceCorr) and Jim Stagnitto (@JimStag), and published in their book Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema.

Example with BEAM✲

Goal: Modeling all business events and put into a database in agile way

2 - Fast Data

Introduction to Fast Data

3 - Problems in Practice

Problems

“Fast Data in web analytics”

1. Counting pageview of website2. Counting unique user of website3. Sending email when pageview is unnormal (simple DDOS

attack detection)

4 - Thinking with RFX

● A design pattern to solve big fast data problems● A collection of Open Source Tools● The mission of RFX

1. Build data product quickly with design patterns2. Apply BEAM✲ for agile data pipeline3. React to critical events in near-real-time

What is RFX or Reactive Function X ?

Philosophy of RFX

How to solve problems with RFX ?

“Fast Data in web analytics”

1. Counting pageview of website2. Counting unique user of website3. Sending email when pageview is unnormal (simple

DDOS attack detection)

Apply RFX into Pageview Analytics

1.1. Event data actor: a web user1.2. Event data agent: RFX-track-js1.3. Event data collector: RFX-track-server1.4. Event data router: Apache Kafka1.5. Event data processor: RFX-stream1.6. Event data storage: Redis, MySQL1.7. Event data query: RFX-data-api1.8. Event data reactor: RFX-reactor

Demo and Explanation for code and concepts

Readings

● http://www.decisionone.co.uk/press/agile-data-warehouse-design-sampler.pdf● http://www.slideshare.net/votrongdao/agile-data-warehouse-34427798

● Apache Kafka Installation Video | How To Setup Apache Kafka https://youtu.be/Fg8cTsEk7Gc ● https://www.tutorialspoint.com/apache_kafka/● https://kafka.apache.org/quickstart

● http://xyu.io/2015/07/13/building-a-faster-etl-pipeline-with-flume-kafka-and-hive/● http://blog.cloudera.com/blog/2015/06/architectural-patterns-for-near-real-time-data-pr

ocessing-with-apache-hadoop/● https://www.oreilly.com/ideas/drivetrain-approach-data-products