hw09 real time business intelligence

21
Real-Time BI in Hadoop Bradford Stephens Lead Engineer, Visible Technologies Principal Consultant, Drawn to Scale Consulting

Upload: cloudera-inc

Post on 20-Aug-2015

1.894 views

Category:

Technology


0 download

TRANSCRIPT

Real-Time BI in HadoopBradford Stephens

Lead Engineer, Visible TechnologiesPrincipal Consultant, Drawn to Scale Consulting

Topics

•Scalability and BI

•Costs and Abilities

•Search as BI

What Is BI?

What is “Real-Time”

•Understanding Latency

•We aim for <5 secs.

Scalability in BI

•Scalbility matters now

•Social Media: Catalyst

•All data is important

•Data doesn’t scale with business size any more

Search as BI

•Katta = Distributed Search on Haddoop

•Bobo = Faceted Lucene

Doing it Cheap

•100 TB, Structured and Unstructured

•Oracle- $100,000,000

•“NewSQL” - $4,000,000

•Hadoop + Katta - $250,000

Why We Need Hadoop

•Need to process high-latency data to get the “small stuff” fast

•Robust Ecosystem

•Need more than SQL. RDBMS not a Swiss-Army Knife

Aggregation is Real-Time

•Distributed Search w/ Katta + Facets = Aggregation-Based BI

•Sum, Count, Filter, Avg, Group

Protips: Review

•Understand High vs. Low Latency data

•Hadoop makes it cheap

•Pre-aggregate w/ Hadoop, Explore w/ Katta + Faceted Search

The Future

•Search/BI as a Platform: “Google my Data Warehouse”

•Real-Time MR on HBase