realtime visitor analysis with couchbase and elasticsearch · realtime visitor analysis with...
TRANSCRIPT
Realtime visitor analysis with Couchbase and Elasticsearch
Jeroen Reijn | @jreijn | #nosql13
follow the Hippo trail
follow the Hippo trail
NoSQL Matters 2013
About me
Jeroen Reijn
Software engineer
Hippo
@jreijn
http://blog.jeroenreijn.com
follow the Hippo trail
NoSQL Matters 2013
About Hippo
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ Goto
Visitor Analysis
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ Goto
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ Goto
follow the Hippo trail
NoSQL Matters 2013
Journey based Targeting
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ Goto
How we analyse visitors @ Hippo
follow the Hippo trail
NoSQL Matters 2013
Registration
Visitor - entity making HTTP requests Collector - records data about a visitor or his behaviour
Example: location collector (GeoIPCollector) Targeting Data - all data about a specific visitor
Example: IP address is located in Amsterdam
follow the Hippo trail
NoSQL Matters 2013
MatchingCharacteristic - a type of fact about visitors
Example: "comes from a city", "experiences a type of weather"
Target Group - the specification of a Characteristic Example: "comes from a European city", "comes from Amsterdam"
Persona - one or more target groups that describe a certain type of visitor
Example: "Jim, the European urban consumer", "Alice, the Pet owner"
follow the Hippo trail
NoSQL Matters 2013
What do we store?Request log
!
Targeting data
!
Statistics
Averages, e.g. how many visitors became which persona
follow the Hippo trail
NoSQL Matters 2013
Real-time analysis
follow the Hippo trail
NoSQL Matters 2013
How about YOU?
• Do you analyse your visitors?
• Do you do it ‘real-time’?
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ GotoArchitecture
follow the Hippo trail
NoSQL Matters 2013
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
XMLJSON (X)HTML
follow the Hippo trail
NoSQL Matters 2013
Delivery Tier
URL Matching
Fetch content
Compose output
Request
Response
follow the Hippo trail
NoSQL Matters 2013
Delivery Tier
URL Matching
Collect data
Compose output
Request
Response
Fetch content
Scoring
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ GotoScaling
follow the Hippo trail
NoSQL Matters 2013
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
Hippo Delivery Tier
Hippo Repository
App server
Scaling out
follow the Hippo trail
NoSQL Matters 2013
RDBMS
Delivery Tier
Repository
App server
Delivery Tier
Repository
App server
Scaling out
Targeting Datastore
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ GotoWhat kind of storage?
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ Goto
Writer
Single write
Datastore
Several reads
Typical Data Access Pattern
follow the Hippo trail
NoSQL Matters 2013
Analytics Data Access Pattern
Writers
Datastore
Single read
Several writes
CMS user
follow the Hippo trail
NoSQL Matters 2013
Targeting Data Access Pattern
Visitors
Datastore
Single read
Several writes
Several reads
CMS user
follow the Hippo trail
NoSQL Matters 2013
Distributed Cache
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ Goto
Requirements change!
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ GotoNoSQL ?
follow the Hippo trail
NoSQL Matters 2013
Suitable types• Key-value store
• Document database
• Column oriented store
follow the Hippo trail
NoSQL Matters 2013
Assessment Criteria
Maturity Data model
Consistency model
PerformanceReplication
Caching model Query model
Monitoring
Scalability
Reliability
Support
follow the Hippo trail
NoSQL Matters 2013
Selection Criteria• Performance
• Scalability
• Schema flexibility
• Simplicity
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ GotoCouchbase
follow the Hippo trail
NoSQL Matters 2013
Why Couchbase?
• Drop-in replacement for memcached
• Read/Write-through cache
• High throughput
• Easily scalable
• Schema flexibility
• Low latency
follow the Hippo trail
NoSQL Matters 2013
Couchbase
• Open Source
• Document-oriented
• Easy Scalable
• Consistent High Performance
• Apache licensed
follow the Hippo trail
NoSQL Matters 2013
Performance
• Object managed cache
• Write Queue to disk
follow the Hippo trail
NoSQL Matters 2013
Easy scalable
• Auto sharding
• Cross cluster replication (XDCR)
• Master - Master replication
follow the Hippo trail
NoSQL Matters 2013
Flexible data model
• Native JSON support
• Incremental Map Reduce
• Gives power to the developer
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ Goto
How we run Couchbase @ Hippo
follow the Hippo trail
NoSQL Matters 2013
Load Balancer
Database cluster
Hippo Delivery Tier Couchbase cluster
•Request log data •Targeting data •Statistics data
follow the Hippo trail
NoSQL Matters 2013
Analysis capabilities• Querying via views
• Secondary indexes via views
• Views based on Map - Reduce
• Limited ad-hoc query capabilities
follow the Hippo trail
NoSQL Matters 2013
Elasticsearch
• Apache Lucene
• Designed to be distributed
• Schema free
• Apache license
• RESTful API
follow the Hippo trail
NoSQL Matters 2013
Added value• Unstructured search
• Structured search
• Faceted search
• Geo spatial search
• Combinate all
• All in (near) real-time
follow the Hippo trail
NoSQL Matters 2013
Couchbase Server Cluster Elasticsearch Server Cluster
Hippo Delivery Tier
Java API
Wri
te
Rea
d
Couchbase Transport plugin
Replication
XDCR
Read / Query
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ GotoWhat’s Next?
follow the Hippo trail
NoSQL Matters 2013
Advanced analytics
follow the Hippo trail
NoSQL Matters 2013
OneHippo @ Goto{ Demo }