acunu analytics: simpler real-time cassandra apps
DESCRIPTION
Talk for the Cassandra Seattle Meetup April 2013: http://www.meetup.com/cassandra-seattle/events/114988872/ Cassandra's got some properties which make it an ideal fit for building real-time analytics applications -- but getting from atomic increments to live dashboards and streaming queries is quite a stretch. In this talk, Tim Moreton, CTO at Acunu, talks about how and why they built Acunu Analytics, which adds rich SQL-like queries and a RESTful API on top of Cassandra, and looks at how it keeps Cassandra's spirit of denormalization under the hood.TRANSCRIPT
![Page 1: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/1.jpg)
Acunu Analytics: Simpler Real-Time Cassandra Apps
Tim Moreton CTO@timmoreton
Monday, 29 April 13
![Page 2: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/2.jpg)
2
• Scalable. No single point of {failure, bottleneck}• Fast. Especially for writes•Available. Effortless Multi-DC support•Maturing fast. Lots of production deployments
WE C*
Monday, 29 April 13
![Page 3: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/3.jpg)
3
WE C*
Virtual nodes CQL Support
Monday, 29 April 13
![Page 4: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/4.jpg)
4
• Spartan queries •Thrift (and CQL, a bit) •Denormalization hurts agility •Weak update semantics
Challenges remain, of course.
WE C*
Monday, 29 April 13
![Page 5: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/5.jpg)
5
C*: Two uses
Monday, 29 April 13
![Page 6: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/6.jpg)
5
Session storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html
• Many more reads than writes
• Updates to existing records(ideally, transactionally)
• Probably fits in RAM:distribute for availability
C*: Two uses
Monday, 29 April 13
![Page 7: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/7.jpg)
5
Real-time analytics
02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html
• Many more writes than reads
• Almost all reads are to results
• Almost no writes are ‘updates’
• Distribute for availability, performance, capacity
Session storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html
• Many more reads than writes
• Updates to existing records(ideally, transactionally)
• Probably fits in RAM:distribute for availability
C*: Two uses
Monday, 29 April 13
![Page 8: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/8.jpg)
5
Real-time analytics
02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html
• Many more writes than reads
• Almost all reads are to results
• Almost no writes are ‘updates’
• Distribute for availability, performance, capacity
Session storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html
• Many more reads than writes
• Updates to existing records(ideally, transactionally)
• Probably fits in RAM:distribute for availability
C*: Two uses
Monday, 29 April 13
![Page 9: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/9.jpg)
6
C* on
•Rich, SQL-like queries•RESTful HTTP APIs, JSON-based•Automated denormalization •Update semantics < less critical for analytics
Supplement Cassandra with:
Monday, 29 April 13
![Page 10: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/10.jpg)
7
Analytics: Two patterns
Monday, 29 April 13
![Page 11: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/11.jpg)
7
Exploratory Analytics
UnstructuredWarehouses
Data Mining
?Machine Learning
Analytics: Two patterns
Monday, 29 April 13
![Page 12: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/12.jpg)
7
Exploratory Analytics
UnstructuredWarehouses
Data Mining
?Machine Learning
Analytics: Two patterns
Operational Intelligence
Dashboards Real-time Decisions
Alerting
!
Monday, 29 April 13
![Page 13: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/13.jpg)
7
Exploratory Analytics
UnstructuredWarehouses
Data Mining
?Machine Learning
Analytics: Two patterns
Operational Intelligence
Dashboards Real-time Decisions
Alerting
!
Complex analysis, data varietyQuery richness
Data freshness, response timeQuery speed
Monday, 29 April 13
![Page 14: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/14.jpg)
7
Exploratory Analytics
UnstructuredWarehouses
Data Mining
?Machine Learning
Analytics: Two patterns
Operational Intelligence
Dashboards Real-time Decisions
Alerting
!
Complex analysis, data varietyQuery richness
Data freshness, response timeQuery speed
Monday, 29 April 13
![Page 15: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/15.jpg)
8
API
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interface
Monday, 29 April 13
![Page 16: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/16.jpg)
9
Who uses Acunu?
Location DataWeb and Visitor
Market/Tick Data
Infrastructure
Sensor Data
Social Media
Social GamingSmart Grid
Production Line
Monday, 29 April 13
![Page 17: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/17.jpg)
10
Monday, 29 April 13
![Page 18: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/18.jpg)
10
API
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interfaceAPI
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interface
Cassandra stores raw events and intermediate aggregates
Monday, 29 April 13
![Page 19: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/19.jpg)
10
API
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interfaceAPI
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interface
Cassandra stores raw events and intermediate aggregates
API
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interface
Acunu Analytics is a Cassandra client mapping new events, queries and schema changes to aggregate reads and writes
!
API
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interface
Monday, 29 April 13
![Page 20: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/20.jpg)
10
API
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interfaceAPI
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interface
Cassandra stores raw events and intermediate aggregates
Acunu Dashboards provides embeddable, custom data visualization using HTTP API
API
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interface
Acunu Analytics is a Cassandra client mapping new events, queries and schema changes to aggregate reads and writes
!
API
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interface
Monday, 29 April 13
![Page 21: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/21.jpg)
CREATE TABLE APICalls (time TIME(‘PST’, HOUR, MIN, SEC),path PATH(/),useragent STRING,latitude DOUBLE(0.1, 0.01),longitude DOUBLE(0.1, 0.01)
);
CREATE CUBE SELECT COUNT, AVG(respTime) FROM APICalls WHERE time, path GROUP BY time, path;
CREATE CUBE SELECT COUNT FROM APICalls WHERE latitude, longitude GROUP BY latitude, longitude;
11
(Loosely) Define a schema
• Tables have HTTP endpoint; map to a set of ColumnFamilys• Dimensions map keys in events; allow hierarchical aggregation• Cubes defines dimensions and aggregate to maintain
Monday, 29 April 13
![Page 22: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/22.jpg)
CREATE CUBE SELECT SUM(a) FROM t WHERE x, y GROUP BY g, h, i;
12
Aggregation
API
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interface
Monday, 29 April 13
![Page 23: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/23.jpg)
CREATE CUBE SELECT SUM(a) FROM t WHERE x, y GROUP BY g, h, i;
12
Aggregation
API
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interface
New event:Apply SUM(v, v’) on this cell
vA: v’X: xY: yZ: z
y
x
(g, h, i)
Monday, 29 April 13
![Page 24: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/24.jpg)
CREATE CUBE SELECT SUM(a) FROM t WHERE x, y GROUP BY g, h, i;
12
Aggregation
• Hierarchical dimensions cause multiple writes per event(That’s ok: Cassandra’s good at writes)
• Most aggregates result in atomic counter increments
API
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interface
New event:Apply SUM(v, v’) on this cell
vA: v’X: xY: yZ: z
y
x
(g, h, i)
Monday, 29 April 13
![Page 25: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/25.jpg)
SELECT SUM(a) FROM t WHERE x = .. and y = .. GROUP BY g, h, i;
13
Queries
API
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interface
• WHEREs map to a Cassandra row and GROUP BY to a compound column key in that row (very roughly)
Monday, 29 April 13
![Page 26: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/26.jpg)
SELECT SUM(a) FROM t WHERE x = .. and y = .. GROUP BY g, h, i;
13
Queries
API
event stream
event store
roll-upcubes
Ingest Processing
dashboard queries programatic interface
New query:
• Locate slice that matches WHERE
• Return all mappings from GROUP BY tuples to cell values
vy
x
(g, h, i)
• WHEREs map to a Cassandra row and GROUP BY to a compound column key in that row (very roughly)
Monday, 29 April 13
![Page 27: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/27.jpg)
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3221 :00→22 :01→19 :02→104 ...
... ...
UK all→228 user01→1 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1904 ...
∅ all→87314 UK→238 US→354 ...
14
A concrete example
Monday, 29 April 13
![Page 28: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/28.jpg)
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :01→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→355 ...
{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,
}
15
Each event updates multiple aggregates:
A concrete example
Monday, 29 April 13
![Page 29: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/29.jpg)
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :01→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→355 ...
{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,
}
15
Each event updates multiple aggregates:
WHERE time IN (22:00,23:00)GROUP BY minute
A concrete example
Monday, 29 April 13
![Page 30: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/30.jpg)
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :01→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→355 ...
{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,
}
15
Each event updates multiple aggregates:
WHERE time IN (22:00,23:00)GROUP BY minute
WHERE geography=US GROUP BY user
A concrete example
Monday, 29 April 13
![Page 31: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/31.jpg)
16
SELECT `SUM(x)/(MAX(y) - MIN(y) + 0.5) AS 'spread' FROM ...
Arithmetic expressions
SELECT a - b AS lbound, a + b AS ubound FROM (SELECT AVG(score) AS a FROM scores WHERE year = 2012) JOIN (SELECT STDDEV(score) AS b FROM scores) USING (school)
Fast inner joins
SELECT COUNT UNIQUE (visitors) GROUP BY time(DAY(‘US/Pacific’))
Time zone support
SELECT SUM(size) FROM ..WHERE path MATCHES /usr/*
Hierarchical aggregationSELECT DRILL FROM errors WHERE category IN (“warn”, “error”)
Drill down to raw events
SELECT COUNT (items) FROM ..GROUP BY category LIMIT 3, country
... HAVING AVG(rating) < 2.0 AND COUNT >= 10
Limits
Query-time filtering
Rich queries
Monday, 29 April 13
![Page 32: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/32.jpg)
17
Monday, 29 April 13
![Page 33: Acunu Analytics: Simpler Real-Time Cassandra Apps](https://reader033.vdocuments.mx/reader033/viewer/2022051110/54be74f34a7959237f8b45b1/html5/thumbnails/33.jpg)
Apache, Apache Cassandra, Cassandra, Hadoop, and the eye and elephant logos are trademarks of the Apache Software Foundation.
Thank You.
Tim Moreton CTO@timmoreton
Monday, 29 April 13