hippo gettogether: the architecture behind hippos relevance platform

53
Building a relevance platform with Couchbase and Elasticsearch Hippo GetTogether, 21 June 2013 Jeroen Reijn | @jreijn | #hgt2013 Hippo GetTogether 2013 follow the Hippo trail

Upload: jeroen-reijn

Post on 26-Jan-2015

115 views

Category:

Technology


0 download

DESCRIPTION

These slides were from my Hippo GetTogether 2013 presentation. During this presentation I went into detail about the architecture behind our high performance relevance platform. The talk will also cover why we chose CouchBase for storage and how Elasticsearch can be used for search and analytics. I shared how we integrated and leverage both products full-circle from within our Hippo CMS product.

TRANSCRIPT

Page 1: Hippo GetTogether: The architecture behind Hippos relevance platform

Building a relevance platform with Couchbase and

Elasticsearch

Hippo GetTogether, 21 June 2013Jeroen Reijn | @jreijn | #hgt2013

Hippo GetTogether 2013

follow the Hippo trail

Page 2: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

About me

• Architect @ Hippo

• DevOps guy

• Blogger @ http://blog.jeroenreijn.com

Page 3: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

Relevance?

Page 4: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

“The capability of a search engine or function to

retrieve data appropriate to a user's needs.”

http://www.thefreedictionary.com/relevance

Page 5: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

Page 6: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

How we deliver relevant content

@Hippo

Page 7: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Registration

Visitor - entity making HTTP requests

Collector - records data about a visitor or his behavior

Example: location collector (GeoIPCollector)

Targeting Data - all data about a specific visitor

Example: IP address is located in Amsterdam

Page 8: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

MatchingCharacteristic - a type of fact about visitors

Example: "comes from a city", "experiences a type of weather"

Target Group - the specification of a Characteristic

Example: "comes from a European city", "comes from Amsterdam"

Persona - one or more target groups that describe a certain type of visitor

Example: "Jim, the European urban consumer",

"Alice, the Pet owner"

Page 9: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

What do we store?Request log

Targeting data

Statistics

Averages, e.g. how many visitors became which persona

Page 10: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

BIG DATA !!

Page 11: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Real-time analysis

Page 12: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoArchitecture

Page 13: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

RDBMS

Hippo Delivery Tier

Hippo Repository

App server

XMLJSON (X)HTML

Page 14: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Delivery Tier

URL Matching

Fetch content

Compose output

Request

Response

Request

Page 15: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Delivery Tier

URL Matching

Targeting Data Collection

Compose output

Request

Response

Request

Fetch content

Scoring

Page 16: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoScaling

Page 17: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

RDBMS

Hippo Delivery Tier

Hippo Repository

App server

Hippo Delivery Tier

Hippo Repository

App server

Scaling out

Page 18: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

RDBMS

Delivery Tier

Repository

App server

Delivery Tier

Repository

App server

Scaling out

TargetingDatastore

Page 19: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoWhat kind of ‘storage’?

Page 20: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoQuestion?

Page 21: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Distributed Cache?

Page 22: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

We have a winner!

Page 23: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

Requirements change!

Page 24: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoNoSQL to the rescue

Page 25: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Suitable types• Key-value store

• Document database

Page 26: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Assessment Criteria

Maturity Data model

Consistency model

PerformanceReplication

Caching model Query model

Monitoring

Scalability

Reliability

Support

Page 27: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Selection Criteria• Performance

• Scalability

• Schema flexibility

• Simplicity

• Monitoring

• Support

Page 28: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

Performance !!

Performance !!!!

Page 29: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoScalability

Page 30: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoSchema flexibility

Page 31: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

{ "visitorId": "7a1c7e75-8539-40", "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": []}

Request log document

Page 32: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

{ "geo": { "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" }}

Visitor document

Page 33: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoSimplicity

Page 34: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoMonitoring

Page 35: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoSupport

Page 36: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoCouchbase

Page 37: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Why Couchbase?

• Drop-in replacement for memcached

• Read/Write-through cache

• High throughput

• Easy scalability

• Schema flexibility

• Low latency

Page 38: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Couchbase

• Open Source

• Document-oriented

• Easy Scalable

• Consistent High Performance

• Apache license

Page 39: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Performance

• Object managed cache

• Write Queue to disk

• Avoids Cold Cache

Page 40: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Source: http://www.slideshare.net/Couchbase/benchmarking-couchbase Copyright © Altoros Systems, Inc.

Page 41: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Easy scalable

• Auto sharding

• Cross cluster replication (XDCR)

• Master - Master replication

Page 42: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Flexible data model

• Native JSON support

• Incremental Map Reduce

• Gives power to the developer

Page 43: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

How we run Couchbase @Hippo

Page 44: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Load Balancer

Database cluster

Hippo Delivery Tier Couchbase cluster

•Request log data•Targeting data•Statistics data

Page 45: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Query capabilities• Querying via views

• Secondary indexes via views

• Views based on Map - Reduce

• Lacks some advanced query capabilities

Page 46: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Elasticsearch

• Apache Lucene

• Designed to be distributed

• Schema free

• Apache license

• RESTful API

Page 47: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Added value of ES• Full text search

• Faceted search

• Geo spatial search

• All in (near) real-time

Page 48: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Couchbase Server Cluster Elasticsearch Server Cluster

Hippo Delivery Tier

Java API

Wri

te

Rea

d

XDCR Couchbase ES Transport plugin

Replicating to ES

Page 49: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoWhat’s Next?

Page 50: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoWhat’s Next?

Page 51: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

Advanced analytics

Page 52: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoDemo time!

Page 53: Hippo GetTogether: The architecture behind Hippos relevance platform

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

Thank you!

Questions?

[email protected] | @jreijn

ps. We’re hiring!