how companies use nosql & couchbase - nosql now 2014
DESCRIPTION
My presentation from the NoSQL Now 2014 conference. Abstract NoSQL databases including Couchbase are increasingly being selected as the backend technology for web and mobile apps. Document databases in particular are well suited for a large number of different use cases as an operational datastore. This session provides a brief overview of Couchbase Server, a document database and its underlying distributed architecture. In addition, Dipti will present some common use cases of Couchbase with a drill down into three specific customer use cases. Paypal – A multi data center session store LivePerson – A scalable, real time analytics system Orbitz – A highly available cache solutionTRANSCRIPT
How Companies use NoSQL and Couchbase
Dip7 Borkar
Sr. Director, Solu7ons Engineering
What is Couchbase
Overview
Couchbase offers a full range of Data Management solu7ons
High Availability Cache
Key Value Document Mobile device
SSN: 400 658 9993 Pass: ******
Pass: ******
NoSQL Database Considera7ons
Easy Scalability
Consistent High Performance
Flexible Data Model
Always On 24x7x365
Grow cluster without applica<on changes, without down<me when needed
Always awesome experience for your applica<on users
The sun never sets on the Internet, your applica<on needs the database to always serve data
Keep developers produc<ve and allow fast and easy addi<on of new features
JSONJSONJSON
JSONJSON
PERFORMANCE
Couchbase solu7on “The basics”
3 3 2
Single node – Couchbase Write Opera7on
Managed Cache
Disk Que
ue
Disk
Replica<on Queue
App Server
Couchbase Server Node
To other node
Doc 1
Doc 1 Doc 1
3 3 2
Single node – Couchbase Read Opera7on
Managed Cache
Disk Que
ue
Disk
Replica<on Queue
App Server
Couchbase Server Node
To other node
Doc 1
Get Doc 1
Doc 1 Doc 1
Auto Sharding and Cluster Map
Hash func7on (KEY)
vB1 vB2 vB3 vB4 vB5 vB6
Physical
servers
A B C D
More scalability required Add node
Logical
Par77o
ns
Cluster Map
New Cluster Map
Couchbase Server Cluster
Basic Opera7on
User Configured Replica Count = 1
Read/write/update
Ac<ve
SERVER 1
Ac<ve
SERVER 2
Ac<ve
SERVER 3
App Server 1
COUCHBASE Client Library CLUSTER MAP
COUCHBASE Client Library CLUSTER MAP
App Server 2
Doc 5
Doc 2
Doc 9
Doc
Doc
Doc
Doc 4
Doc 7
Doc 8
Doc
Doc
Doc
Doc 1
Doc 3
Doc 6
Doc
Doc
Doc
Replica Replica Replica
Doc 4
Doc 1
Doc 8
Doc
Doc
Doc
Doc 6
Doc 3
Doc 2
Doc
Doc
Doc
Doc 7
Doc 9
Doc 5
Doc
Doc
Doc
• Docs distributed evenly across servers
• Each server stores both ac7ve and replica docs Only one server ac<ve at a <me
• Client library provides app with simple interface to database
• Cluster map provides map to which server doc is on App never needs to know
• App reads, writes, updates docs
• Mul7ple app servers can access same document at same 7me
Add Nodes to Cluster
SERVER 4
SERVER 5
Replica
Ac<ve
Replica
Ac<ve
Read/write/update
App Server 1
COUCHBASE Client Library CLUSTER MAP
COUCHBASE Client Library CLUSTER MAP
App Server 2
User Configured Replica Count = 1
Couchbase Server Cluster
Ac<ve
SERVER 1
Doc 5
Doc 2
Doc 9
Doc
Doc
Doc
Replica
Doc 4
Doc 1
Doc 8
Doc
Doc
Doc
Ac<ve
SERVER 2
Doc 4
Doc 7
Doc 8
Doc
Doc
Doc
Replica
Doc 6
Doc 3
Doc 2
Doc
Doc
Doc
Ac<ve
SERVER 3
Doc 1
Doc 3
Doc 6
Doc
Doc
Doc
Replica
Doc 7
Doc 9
Doc 5
Doc
Doc
Doc
Read/write/update
• Two servers added with one-‐click opera7on
• Docs automa7cally rebalance across cluster Even distribu<on of docs Minimum doc movement
• Cluster map updated
• App database calls now distributed over larger number of servers
Fail Over Node
User Configured Replica Count = 1
SERVER 4
SERVER 5
Replica
Ac<ve
Replica
Ac<ve
App Server 1
COUCHBASE Client Library CLUSTER MAP
COUCHBASE Client Library CLUSTER MAP
App Server 2
Couchbase Server Cluster
Ac<ve
SERVER 1
Doc 5
Doc 2
Doc 9 Doc
Doc
Doc
Replica
Doc 4
Doc 1
Doc 8 Doc
Doc
Doc
Ac<ve
SERVER 2
Doc 4
Doc 7 Doc 8
Doc
Doc Doc
Replica
Doc 6
Doc 3 Doc 2
Doc
Doc Doc
Ac<ve
SERVER 3
Doc 1
Doc 3
Doc 6 Doc
Doc
Doc
Replica
Doc 7
Doc 9
Doc 5 Doc
Doc
Doc
• App servers accessing docs
• Requests to Server 3 fail
• Cluster detects server failed – Promotes replicas of docs
to ac<ve – Updates cluster map
• Requests for docs now go to appropriate server
• Typically rebalance would follow
Doc 1 Doc 3
Doc
Couchbase Server Cluster
Indexing and Querying
User Configured Replica Count = 1
Ac<ve
SERVER 1
SERVER 3
App Server 1
COUCHBASE Client Library CLUSTER MAP
COUCHBASE Client Library CLUSTER MAP
App Server 2
Doc 5
Doc 2
Doc 9
Doc
Doc
Doc
Ac<ve
Doc 1
Doc 3
Doc 6
Doc
Doc
Doc
Replica
Doc 4
Doc 1
Doc 8
Doc
Doc
Doc
Ac<ve
SERVER 2
Doc 4
Doc 7
Doc 8
Doc
Doc
Doc
Replica
Doc 6
Doc 3
Doc 2
Doc
Doc
Doc
Replica
Doc 7
Doc 9
Doc 5
Doc
Doc
Doc
• Indexing work is distributed amongst nodes
• Large data set possible
• Parallelize the effort
• Each node has index for data stored on it
• Queries combine the results from required nodes
Query
ACTIVE
SERVER 1
RAM
DISK
Doc
Doc 2
Doc 9
Doc Doc Doc
ACTIVE
SERVER 2
RAM
DISK
Doc
Doc
Doc
Doc Doc Doc
ACTIVE
SERVER 3
RAM
DISK
Doc
Doc
Doc
Doc Doc Doc
Cross Data Center Replica7on (XDCR)
COUCHBASE SERVER CLUSTER
NYC DATA CENTER
COUCHBASE SERVER CLUSTER
SF DATA CENTER
ACTIVE
SERVER 1
RAM
DISK
Doc
Doc 2
Doc 9
Doc Doc Doc
ACTIVE
SERVER 2
RAM
DISK
Doc
Doc
Doc
Doc Doc Doc
ACTIVE
SERVER 3
RAM
DISK
Doc
Doc
Doc
Doc Doc Doc
{ }
{ }
{ } { }
{ } { }
{ } { }
{ }
{ }
{ } { }
{ }
Use Cases
High-‐Availability Caching
RDBMS
Applica7on Layer User Requests
Cache Misses and Write Requests
Read-‐Write Requests
Couchbase Distributed Cache
Use Case 1
• Applica<on objects
• Popular search query results
• Session informa<on
• Heavily accessed web landing pages
High-‐Availability Caching
• Speed up RDBMS
• Consistently low response <mes for document / key lookups
• High-‐availability 24x7x365
• Replacement for en<re caching <er
Data cached in Couchbase? Applica7on characteris7c
Use Case 1
hap://www.Look.PopularSearchWuerycom
Look Something Search
WEB % of clicks
% of clicks
something 56.3 28
DoSomething.com 13.4 25.08
SomethingFishy.org 9.8 14.68
Popular
Couchbase, Inc. Confiden<al
High-‐Availability Caching
• Low latency in sub-‐milliseconds with consistently high read / write throughput using built-‐in cache
• Always-‐on opera7ons even for database upgrades and maintenance with zero down 7me
Why NoSQL and Couchbase?
Use Case 1
Couchbase, Inc. Confiden<al
Session Store Use Case 2
Couchbase, Inc. Confiden<al
Session Store
• Extremely fast access to session data using unique session ID
• Easy scalability to handle fast growing number of users and user-‐generated data
• Always-‐on func<onality for global user base
Applica7on characteris7c
Use Case 2
• Session values or Cookies (stored as key-‐value pairs)
• Examples include: items in a shopping cart, flights selected, search results, etc.
Data stored in Couchbase?
Couchbase, Inc. Confiden<al
Session Store
• Low latency in sub-‐milliseconds with consistently high read / write throughput for session data via the built-‐in object-‐level cache
• Linear throughput scalability to grow the database as user and data volume grow
• Always-‐on opera7ons even par7cularly high availability using Couchbase replica7on and failover
• Intra cluster and cross cluster (XDCR) replica7on for globally distributed ac7ve-‐ac7ve plagorm
Why NoSQL and Couchbase?
Use Case 2
Couchbase, Inc. Confiden<al
Globally Distributed User Profile Store Use Case 3
hap://www.ProfileStore.com
e enim nec felis rhoncus, ac volutpat magna blandit. Nunc facilisis turpis eget dolor mollis, id <ncidunt dui mais. Nunc sodales elementum turpis, vel interdum ante congue quis. Pellentesque habitant morbi tris<que senectus et netus et malesuada fames ac turpis egestas. Aliquam erat volutpat. Nullam suscipit diam nec tortor pharetra, vitae adipiscing dolor pre<um. Integer ac porta tortor. Ves<bulum imperdiet quam laoreet nisl scelerisque, a tempus tortor <ncidunt. Mauris suscipit dui ac urna dignissim, vitae aliquet velit convallis. Phasellus lobor<s felis eu magna vulputate dapibus. Ut ornare ut quam a vulputat ullam et dui odio. Nulla pharetra, velit ac convallis semper, dolor turpis porta nunc, in egestas mauris leo a nisi. Pellentesque fringilla sagiis magna vitae imperdiet. Mauris ac leo ut tellus aliquet interdum. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nunc cursus odio sit amet elit mollis, et sollicitudin lacus accumsan. Nulla facilisi. Fusce et vehicula sem. Curabitur interdum ves<bulum nulla id accumsan. Integer ut tortor in ligula semper vehicula. Ves<bulum ut nibh ultrices, venena<s metus at, adipiscing ipsum. Donec quis consequat lectus. Class aptent taci< sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Donec a diam tempus, aliquet ipsum eu, ves<bulum sapien. Donec eleifend lectus sit amet luctus facilisis. Morbi poritor, orci sit amet placerat tempus, nisi justo dictum augue, ac dignissim elit enim eget dolor. Praesent pulvinar ipsum arcu, eu posuere eros luctus nec. Ves<bulum odio eros, ultrices non metus sit amet, tris<que malesuada augue. Pellentesque lacinia dolor nec diam eleifend mollis. Ves<bulum sit amet ultrices diam. Aliquam lacinia accumsan eros id hendrerit. Cras placerat laoreet urna scelerisque rutrum. Duis ornare mi ac augue varius, sit amet accumsan leo lacinia. Vivamus nec egestas neque. Quisque interdum enim moles<e urn.
turpis eget dolor mollis, id <ncidunt dui mais. Nunc sodales elementum turpis, vel interdum ante congue quis. Pellentesque habitant morbi tris<que senectus et netus et malesuada
Welcome back Laura! You have 3 items in your shopping cart wai<ng for you.
LOGIN
ID: PASS:
Globally Distributed User Profile Store
• Extremely fast access to individual profiles
• Always online system as mul<ple applica<ons access user profiles
• Flexibility to add and update user aaributes
• Easy scalability to handle fast growing number of users
• User profile with unique ID
• User seing / preferences
• User’s network
• User applica<on state
Data stored in Couchbase? Applica7on characteris7c
Use Case 3
Laura930 ********
Globally Distributed User Profile Store
• Low latency and high throughput for very quick lookups for millions of concurrent users using built-‐in cache
• Intra cluster and cross cluster (XDCR) replica7on for high availability and disaster recovery
• Ac7ve-‐ac7ve geo-‐distributed system to handle globally distributed user base
• Online admin opera7ons eliminate system down7me
Why NoSQL and Couchbase?
Use Case 3
Data Aggrega7on
• Flexibility to store any kind of content
• Flexibility to handle schema changes
• Full-‐text Search across data set
• High speed data inges<on
• Scales horizontally as more content gets added to the system
• Social media feeds: Twiaer, Facebook, LinkedIn
• Blogs, news, press ar<cles
• Data service feeds: Hoovers, Reuters
• Data form other systems
Data stored in Couchbase? Applica7on characteris7c
Use Case 4
in
Ft
NEWS
Blog
Data Aggrega7on
• JSON provides schema flexibility to store all types of content and metadata
• Fast access to individual documents via built-‐in cache, high write throughput
• Indexing and querying provides real-‐7me analy7cs capabili7es across dataset
• Integra7on with Elas7cSearch for full-‐text search
• Ease of scalability ensures that the data cluster can be grown seamlessly as the amount of user and ad data grows
Why NoSQL and Couchbase?
Use Case 4
Content and Metadata Store Use Case 5
Content and Metadata
Nature, Field, Summer, Farm, Sky, Environment, Landscaped, Grass, Green,Blue, Oilseed, Rape, Agriculture, Scenics, Land, Spring, Non-‐Urban Scene,Environmental, Conserva<on, Sun, Meadow, Horizon, Season, Cloud, Landscapes, Travel Loca<ons, Pasture, Cul<vated Land, Stratoshpere, cloudy day, Oliseed Rape, Rural Scene, Vibrant Color, No People, Beauty In Nature,Gold, Color Image, Beauty, Idyllic, Mul<colored, Yellow, Colors, Cloudscape,Outdoors, Plant, Sunlight, Horizon Over Land
Content and Metadata Store
• Flexibility to store any kind of content
• Fast access to content metadata (most accessed objects) and content
• Full-‐text Search across data set
• Scales horizontally as more content gets added to the system
• Content metadata
• Content: Ar<cles, text
• Landing pages for website
• Digital content: eBooks, magazine, research material
Data stored in Couchbase? Applica7on characteris7c
Use Case 5
hap://www.LandingPage.com
ebook Mag
Content and Metadata Store
• Fast access to metadata and content via object-‐managed cache
• JSON provides schema flexibility to store all types of content and metadata
• Indexing and querying provides real-‐7me analy7cs capabili7es across dataset
• Integra7on with Elas7cSearch for full-‐text search
• Ease of scalability ensures that the data cluster can be grown seamlessly as the amount of user and ad data grows
Why NoSQL and Couchbase?
Use Case 5
Customer Case Studies
User Profile, Ad Targe2ng & Real-‐Time Analy2cs
• Company Global Leader in Online
Payments
132m Ac<ve Accounts, 193 Markets, 25 Currencies
• Scalability and Performance Requirements 300m to 1bn documents with 3
Tb to 10TB
Billions of requests and sub 200ms response <mes access to JSON documents
Read/write mix 50/50 with 5ms latency
• Exis7ng Database Infrastructure Mul<ple Tiers – Separate
caching and durable store
MySQL, Oracle, Terracoaa, Coherence
• Pain Real-‐Time Access to Iden<ty
Mapping – eBay ID, PayPal ID, Social ID, 3rd Party ID, Email
Performance – Ad needs to be served in 200ms
Cost – Mul<ple <ers for caching and durability
Highly Available – Across large clusters and across data centers
• Couchbase Benefits Performance – Reduced latency
with 5ms access <mes
Cost – Consolida<on of database and cache layers
Cross Data Center Availability
+ + +
Why couchbase?
§ Data volume • Online system ; 300M – 1B documents @ 10k value size ; 3-‐10TB total storage
§ Data Access • Distributed caching • Persistence
§ Data Structure • Flexible & Schemaless
§ Read/Write • 50% read/50% write • Low latency < 10 msec
§ Par77oning § Replica<on § Auto Healing
§ Availability and scalability • Resilient • Mul< data center – DR/BCP • Linearly Scalable
Use cases at PayPal
• Ad Tech targe7ng • Cookie infrastructure • Real 7me analy7cs
Cookie architecture
CookieService
Couchbase DC A Couchbase DC B
Front Tier
Interac<on Channels
Applica7on Cookie Libraries
Mid Tier Data Service
-‐ Key Value -‐ Cache Interface -‐ Couchbase Client
Data Tier XDCR
Document Model
DEPLOYMENT MODEL
A C B
Cookie Service
Cookie Service
Cookie Service
XDCR
ACTIVE ACTIVE PASSIVE
AVAILABILITY REDUNDANCY DISASTER RECOVERY
WRITE READ
High Performance Caching
• Company Leading online travel company
• Scalability and Performance Requirements 11 Clusters/100 Nodes Over 3TB of Data 149,000 Ops/ sec
• Exis7ng Database Infrastructure Rela<onal Database technology, Terracota
• Pain Scalability/Capacity Planning – Cannot be planned. Dependent onexternal factors
Scalability – Complex and <me consuming scaleout
Performance – Caching too complex. Weeks of planning/hours of down<me
Cost – Mul<ple <ers of hardware for database and caching
• Couchbase Benefits Scalability – Over 70 Nodes with simple scaleout in minutes not hours
Performance – Improved response <mes by up to 47% with consistent 3ms to 4ms response
Cost -‐ Consolidate caching and database <ers – less machines, power, cooling, footprint – drama<c savings
Dynamic schema change – Drama<cally reduced down<me
High Availability Cache
• 11 Clusters (4 mirrored) 100 nodes • > 3 TB of data • ~430m objects (146m in largest) • Total ops/sec ~ 75k *149k with HA
Use Case #1
• Content HTML Image Links HA caches XDCR
Example of HA DC 1 DC 2
VS.
Real 7me analy7cs
• Company Leading cloud company – allows enterprises to connect in real-‐<me with their customers via chat, voice, and content delivery
• Scalability and Performance Requirements 13TB/Month 20m engagements/month 1.8bn sessions/month
• Exis7ng Database Infrastructure MySQL
• Pain Scalability Performance – Batch analy<cs and real-‐<me access to customer profiles
Cross Data Center Replica<on – 4 data centers
• Couchbase Benefits Scalability Performance – Mixed read/write with very high throughput
Document Store – Ease of Development
+
Use Case: 3rd party data aggrega7on with analy7cs
Real <me Analy<cs for LivePerson's customers
LiveEngage DASHBOARD
LivePerson: Leading customer engagement plagorm
Requirements Requirements Requirements
• High throughput, really fast • Linear scale • Searchable (Views and M/R)
• Supports both K/V & Document store
• Cross data center replica<on • “Always on”, Resilience solu<on
The Problem
13 TB per month ~1 PB In total 1.8 B
Visits per month
VOLUME
Couchbase Java SDK
Applica<on server Tomcat
M/R views
cluster
M/R views
cluster
XDCR
REST API
Couchbase Java SDK
Storm Topology
Couchbase Java SDK
Storm Topology
Architecture
Visitor
Stream Event Processing
Visitor Feed -‐ Storm
Topology
Customer Representa<ve
Ka{a
Couchbase
Visitor Monitoring Service
(1) Visitor browsing
(2) Visitor events
(4) Write event to user document
(6) Return relevant visitors
(7) Return relevant visitors
(5) Get visitors List Every 3 sec Visitor Feed
API
(3) Analyze relevant events and persist
Data flow
Document Structurestructure
{
"accountId": "64302875",
"id": 121640710013,
"rtSessionId": "643028754295878498",
"eventSequence": 5104,
"ipAddress": {
"fieldValue": "194.39.63.10",
"seq": 1
},
"browser": {
"fieldValue": "Chrome 27.0.1453.116",
"seq": 1
},
"state": {
"fieldValue": "LEFT_SITE",
"seq": 5104
}
......................................
}
Mul< tenant
DB
Basic visitor
informa<on
Sequence
use due to
Ka{a
Ques7ons?
Thank you!