Transcript
Page 1: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Karbon Insight: Realtime Reporting

Page 2: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Introduction to ad serving

Video player

Ad player

Distributor Tracker

Page 3: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Event tracking

•View (event ID 127)•Click (event ID 128)•and many more

Page 4: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

What do our customers want?

•Any report they can dream up•Right away!

Page 5: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Simple report: hour by ad and event

Page 6: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Realtime reporting

Multidimensional OLAP cube

Ad

Event

Time

Page 7: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

ROLAP with star schema

Page 8: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Disadvantages of ROLAP

•Slow queries•Lots of joins•Expensive to scale•SQL limitations

Page 9: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

MOLAP to the rescue!

Page 10: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

What is a counter?

Page 11: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

You can’t always get what you want...

Page 12: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

•Time•Event•Ad•Device

•Category•Location•Tag•Demography

Possible report dimensions

Page 13: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Many counters

8 dimensionsaverage size of 50

508 counters!(39 trillion)

Page 14: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Average campaign length:

21 days(504 hours)

Page 15: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Time flies like a banana

21 days = 39 trillion counters

42 days -> 78 trillion84 days -> 156 trillion365 days -> 677 trillion

Page 16: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

5 years down the road

3.39 quadrillion

Page 17: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

3.39 quadrillion is a rather large number indeed

Number of stars in 7500 galaxies like the Milky way.

15% of the surveyed universe!

Page 18: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013
Page 19: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

But you might

just get what you

need!

Page 20: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Fake it till you can make it

Don’t aggregate anything until they ask

for it!

Page 21: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

•Time period•By hour•And ad•Views•Clicks

Page 22: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Counter Storage

Page 23: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Why Cassandra?

•Fast writes•Linear scaling•Battle-hardened•(Relatively) simple operations•Great community!

Page 24: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Cassandra

TrackerTracker

FlusherFlusherAggregatorAggregator

MergerMerger

live00 ... live31

RabbitMQ

flush00 ... flush31counter00 ... counter31

Page 25: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Our setup

•DataStax CE 1.1.9•18 node cluster•1 datacentre

Page 26: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Data model

•1 keyspace (RF: 3)•1 column family•Leveled compaction

Page 27: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Row keys

aggregate definition IDdimension valuestime granularity

Page 28: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

adef1|(ad1:127)|houradef1|(ad1:128)|houradef1|(ad2:127)|hour...adef1|(ad5:128)|day

Example row keys

Page 29: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Columns

time value ->counter

transaction ID ->id

Page 30: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

2013-09-10.18 -> 6348

txID -> 876219102

Example columns

2013-09-10.19 -> 9784

Page 31: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

total -> 6348

txID -> 876219102

Columns for rows with no time aggregation

Page 32: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Reading counters

Page 33: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Build row keyadef1|(ad1:127)|hour

Page 34: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Prepare querykeyspace .prepareQuery(columnFamily) .getKey(rowKey)

Page 35: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Column ranges2013-09-10.17

...2013-09-10.23

Page 36: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Execute query asynchronously

Page 37: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Get column valueFirst byte is counter type

(long, double, Hyper LogLog)

Page 38: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Writing counters

Page 39: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Flush shards

...Flusher 1

shards 00-08

Flusher 4

shards 24-32

Cassandra

Page 40: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Merge increment rows with read cache

Skip rows with the same transaction ID

Page 41: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Write rows in mutation batches

(of 400)

Page 42: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Things we got wrong

Page 43: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Each CF has 1M heap overhead

Too many column families

Multi-tenancy FTW!

FAIL #1

Page 44: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

CLI defaults to replicationfactor of 1!

Manual operations

Tools and automation FTW!

FAIL #2

Page 45: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

No way to undo data loading

No snapshots

Automated snapshots FTW!

FAIL #3

Page 46: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Post-processing of queried data

Timezones

Store data in customer timezone

FAIL #4

Page 47: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

10 TB of data1500 wps40,000 rps

Page 48: Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013

Q&A


Top Related