one billion rows per second: analytics for the digital media markets

12
One Billion Rows Per Second: Analytics for the Digital Media Markets STRATA SUMMIT NYC September 21, 2011 MICHAEL DRISCOLL CO-FOUNDER & CTO @medriscoll

Upload: garnet

Post on 24-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

One Billion Rows Per Second: Analytics for the Digital Media Markets. STRATA SUMMIT NYC September 21, 2011. MICHAEL DRISCOLL CO-FOUNDER & CTO. @ medriscoll. Taming the Inferno of the Online Ad Markets. billions of microtransactions per day - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: One Billion Rows Per Second: Analytics for the Digital Media Markets

One Billion Rows Per Second:Analytics for the Digital Media Markets

STRATA SUMMIT NYCSeptember 21, 2011

MICHAEL DRISCOLLCO-FOUNDER & CTO

@medriscoll

Page 2: One Billion Rows Per Second: Analytics for the Digital Media Markets

Taming the Inferno of the Online Ad Markets

• billions of microtransactions per day• dozens of publisher, advertiser, & audience attributes

Page 3: One Billion Rows Per Second: Analytics for the Digital Media Markets

Goal: Fast Dashboards Over Big Data

Page 4: One Billion Rows Per Second: Analytics for the Digital Media Markets

datacrunched in

minutes

queries inseconds

dashboard

database

ingestion

Goal: Fast Dashboards Over Big Data

Page 5: One Billion Rows Per Second: Analytics for the Digital Media Markets

datacrunched in

minutes

queries inminutes

dashboard

database

ingestion

Solution 1: Relational Database

MPP relational DB

Hadoop

Page 6: One Billion Rows Per Second: Analytics for the Digital Media Markets

datacrunchedin hours

queriesin seconds

dashboard

database

ingestion

Solution 2: HBase

HBase

Hadoop

Page 7: One Billion Rows Per Second: Analytics for the Digital Media Markets

datacrunched

in minutes

queriesin seconds

dashboard

database

ingestion

Solution 3: Do It Ourselves: Druid

Druid

Hadoop

Page 8: One Billion Rows Per Second: Analytics for the Digital Media Markets

Four Principles of Performance at Scale

SUMMARIZE

DISTRIBUTE

PARALLELIZE

STORE IN-MEMORY

100x smaller vs raw data

100x throughput vs a single node

100x faster vs reading disk

10^6Druid can filter and aggregate over 1 billion rows per second on a 50-core cluster, or 20m rows per core per second factor increase

Page 9: One Billion Rows Per Second: Analytics for the Digital Media Markets

Consequences of Speed: Data Freshness

photo credit: Lars P. http://www.flickr.com/photos/lars_p/4911238308/sizes/o/in/photostream/

Page 10: One Billion Rows Per Second: Analytics for the Digital Media Markets

Consequences of Speed: Blue Sky Exploration

photo credit: MonkeyAt Large http://www.flickr.com/photos/monkeyatlarge/16645379/sizes/l/in/photostream/

Page 11: One Billion Rows Per Second: Analytics for the Digital Media Markets

Consequences of Speed: Interactivity

photo credit tonylanciabeta http://www.flickr.com/photos/tonysphotos/3305157904/sizes/o/in/photostream/

Page 12: One Billion Rows Per Second: Analytics for the Digital Media Markets

One Billion Rows Per Second:Analytics for the Digital Media Markets

QUESTIONS? CONTACT ME AT [email protected]

MICHAEL DRISCOLLCO-FOUNDER & CTO

@medriscoll