building a relevance platform with couchbase and elasticsearch

50
OneHippo @ Goto follow the Hippo trail Building a relevance platform with Couchbase and Elasticsearch @jreijn | Hippo #gotoams, June 18

Upload: jeroen-reijn

Post on 26-Jan-2015

112 views

Category:

Technology


1 download

DESCRIPTION

These slides were from my Goto Amsterdam presentation. During this presentation I went into detail about how we're building a high performance relevance platform at Hippo with Couchbase and Elasticsearch. The talk will also cover why we chose CouchBase for storage and how Elasticsearch can be used for search and analytics. I shared how we integrated and leverage both products full-circle from within our Hippo CMS product.

TRANSCRIPT

Page 1: Building a relevance platform with Couchbase and Elasticsearch

OneHippo @ Goto

follow the Hippo trail

Building a relevance platform with Couchbase

and Elasticsearch@jreijn | Hippo

#gotoams, June 18

Page 2: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

About me

• Architect @ Hippo

• DevOps guy

• Blogger @ http://blog.jeroenreijn.com

Page 3: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

About Hippo

Page 4: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

Relevance?

Page 5: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

“The capability of a search engine or function to

retrieve data appropriate to a user's needs.”

http://www.thefreedictionary.com/relevance

Page 6: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

Page 7: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

How we deliver relevant content

@Hippo

Page 8: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Registration

Visitor - entity making HTTP requests

Collector - records data about a visitor or his behavior

Example: location collector (GeoIPCollector)

Targeting Data - all data about a specific visitor

Example: IP address is located in Amsterdam

Page 9: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

MatchingCharacteristic - a type of fact about visitors

Example: "comes from a city", "experiences a type of weather"

Target Group - the specification of a Characteristic

Example: "comes from a European city", "comes from Amsterdam"

Persona - one or more target groups that describe a certain type of visitor

Example: "Jim, the European urban consumer",

"Alice, the Pet owner"

Page 10: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

What do we store?Request log

Targeting data

Statistics

Averages, e.g. how many visitors became which persona

Page 11: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Real-time analysis

Page 12: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoArchitecture

Page 13: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

RDBMS

Hippo Delivery Tier

Hippo Repository

App server

XMLJSON (X)HTML

Page 14: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Delivery Tier

URL Matching

Fetch content

Compose output

Request

Response

Request

Page 15: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Delivery Tier

URL Matching

Targeting Data Collection

Compose output

Request

Response

Request

Fetch content

Scoring

Page 16: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoScaling

Page 17: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

RDBMS

Hippo Delivery Tier

Hippo Repository

App server

Hippo Delivery Tier

Hippo Repository

App server

Scaling out

Page 18: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

RDBMS

Delivery Tier

Repository

App server

Delivery Tier

Repository

App server

Scaling out

TargetingDatastore

Page 19: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoWhat kind of ‘storage’?

Page 20: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Distributed Cache?

Page 21: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

We have a winner!

Page 22: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

Requirements change!

Page 23: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoNoSQL to the rescue

Page 24: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Suitable types• Key-value store

• Document database

Page 25: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Assessment Criteria

Maturity Data model

Consistency model

PerformanceReplication

Caching model Query model

Monitoring

Scalability

Reliability

Support

Page 26: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Selection Criteria• Performance!

• Scalability

• Schema flexibility

• Simplicity

• Monitoring

• Support

Page 27: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoPerformance !!

Page 28: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoScalability

Page 29: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoSchema flexibility

Page 30: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

{ "visitorId": "7a1c7e75-8539-40", "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": []}

Request log document

Page 31: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

{ "geo": { "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" }}

Visitor document

Page 32: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoSimplicity

Page 33: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoMonitoring

Page 34: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoSupport

Page 35: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoCouchbase

Page 36: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Why Couchbase?

• Drop-in replacement for memcached

• Read/Write-through cache

• High throughput

• Easy scalability

• Schema flexibility

• Low latency

Page 37: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Couchbase

• Open Source

• Document-oriented

• Easy Scalable

• Consistent High Performance

Page 38: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Performance

• Object managed cache

• Write Queue to disk

• Avoids Cold Cache

Page 39: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Easy scalable

• Auto sharding

• Cross cluster replication (XDCR)

• Master - Master replication

Page 40: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Flexible data model

• Native JSON support

• Incremental Map Reduce

• Gives power to the developer

Page 41: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

How we run Couchbase @Hippo

Page 42: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Load Balancer

Database cluster

Hippo Delivery Tier Couchbase cluster

•Request log data•Targeting data•Statistics data

Page 43: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Query capabilities• Querying via views

• Secondary indexes via views

• Views based on Map - Reduce

• Lacks some advanced query capabilities

Page 44: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Elasticsearch

• Apache Lucene

• Designed to be distributed

• Schema free

• Apache 2 licensed

• RESTful API

Page 45: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Added value of ES• Full text search

• Faceted search

• Geo spatial search

• All in (near) real-time

Page 46: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Couchbase Server Cluster Elasticsearch Server Cluster

Hippo Delivery Tier

Java API

Wri

te

Rea

d

XDCR Couchbase ES Transport plugin

Replicating to ES

Page 47: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoDemo time!

Page 48: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoWhat’s Next?

Page 49: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

Advanced analytics

Page 50: Building a relevance platform with Couchbase and Elasticsearch

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

Thank you!

Questions?

[email protected]@jreijn

ps. We’re hiring!