couchbase connect 2016

Download Couchbase Connect 2016

Post on 10-Feb-2017




0 download

Embed Size (px)


1920x1080 PowerPoint presentation template

Michael KehoeStaff Site Reliability EngineerLinkedInGoing all in:From single use-case to many


2OverviewThe LinkedIn StoryCouchbase Use-CasesDevelopment & OperationsConclusionsQuestions

The LinkedIn StoryCouchbase Use-CasesDevelopment & OperationsConclusionsQuestions


$ whoami3Michael Kehoe

Staff Site Reliability Engineer (SRE)Production-SRE teamFunny accent = Australian

$ whatis SRE4Michael KehoeSite Reliability EngineeringOperations for the production application environmentResponsibilities includeArchitecture designCapacity planningOperationsTooling

Site Reliability EngineeringA term coined by Ben Treynor from GoogleHybrid of:SysadminNetwork EngineerArchitectTroubleshooterSoftware EngineerNinjas Digital economy4

$ whatis CBVT5Michael KehoeCouchbase Virtual Team~10 SREs2 Software EngineersSponsored by SRE Director5-90% of their time to support CouchbaseEncourage as many people to contribute as possible

What do we do?Operational work on Couchbase clustersEvangelize the use of Couchbase within LinkedInDevelop tools for the Couchbase Ecosystem

10 SREs, with a tech-leadSponsored by a SRE DirectorInput from Software Engineers on development


6The LinkedIn StoryFounded in 2002, LinkedIn has grown into the worlds largest professional social media network30 offices in 24 countries, Available in 24 languagesMore than 450+ million members worldwide

Founded in 2002, LinkedIn has grown into the worlds largest professional social media network30 offices in 24 countries, Available in 24 languagesMore than 450+ million members worldwide


7The LinkedIn StoryGrowth in ProductsProfilesGroupsRecruiterSales Navigator

Growth in Internet TrafficBillions of page-hits per day100k+ QPS to production services


In-Memory Storage Needs8The LinkedIn StoryLinkedIn started as an Oracle shop

Hyper-growth = Scaling challengesRead-Scaling becomes important

Applicable use-casesSimple cache storePre-warmedRead throughPotential for Source of Truth (SoT) store

LinkedIn started as an Oracle shopTo-date, we still run a significant number of Oracle databasesOracle is fine for writes, scaling reads becomes challengingHyperGrowth == Scaling challengesScaling writes isnt a common problem in most casesScaling reads to 100k+ QPS, is challengingFailures in read-scaling infra can take down back-end systemsApplicable use-casesSimple cache storePre-warmedRead-throughSoT Store


Enter Couchbase9The LinkedIn StoryUntil 2012, we were only using Memcache as a non SoT In-Memory store

DrawbacksDifficult to pre-warmNo partitioning/sharding (had to write our own)Cold-cache restartsDifficult to move data across hosts/clusters data-centers

Until 2012, we were only using Memcache as a non SoT In-Memory storeDrawbacks of memcache:Difficult to pre-warm, not easy to copy-dataNo native sharding for clusters, had to write our ownRestarting memcache servers caused problemsCouldnt copy data across for new DCs, expanding clusters etcMid-2012, started testing Couchbase


Enter Couchbase10The LinkedIn StoryEvaluated replacement systems for Memcached: Mongo, Redis, and othersCouchbase had distinct advantages:Simple replacement for MemcachedBuilt-in replication and cluster expansionAutomatic partitioningLow latencyAsync writes to diskBuilding tooling is simple

Evaluated replacement systems for Memcached: Mongo, Redis, and othersCouchbase had distinct advantagesSimple replacement for memcache JAVA Spring made this simplerBuilt-in replication and cluster expansion, significantly reduces ops-workloadAutomatic partitioning, doesnt become a concern anymoreLow-latency, reads from disk are still very fastAsync write to disk, can write a low of data at once without it being a problemLots of APIs that make tooling relatively simple


Enter Couchbase11The LinkedIn StoryToday we run Couchbase in our Corporate, Staging and Production environments

Production/ Staging statistics:148 buckets2821 hosts10M+ QPS

Largest Clusters:By Hosts: 72 HostsBy Documents: 1.4B DocumentsBy QPS: 2.5M QPS


Summary12Use-CasesTodays use-cases:Simple read-through cacheEphemeral Counter StoreTemporary de-duping storeSoT data-store for internal tooling

Simple read-through cache13Use-CasesDrop-in replacement for memcacheRead-scalingProtecting backend database from large amounts of trafficE.g. 3rd party ingestion credential cache

Counter Store14Use-CasesIn certain places, we simply need to increment counters from multiple systems and store themE.g. Anti-abuse/Anti-scraping systems (Fuse)

Insert fuse architecture14

Temporary De-duping store15Use-CasesNeed to de-dup data over a large application clusterE.g. Email systems Ensure we dont send the same email twice

We have a deduplication filter in stork that you can take advantage of to make sure we don't send duplicates of your email. This is highly recommended for any email using kafka (kafka can potentially deliver your email to our system twice)


SoT Store for Internal Tools16Use-CasesFor Non-Member facing tools, we use Couchbase as a SoT store.Benefits:Schema-lessShort setup timeCouchbase Python Client works easily in our environmentUse views for simple map-reduceExample Uses:Nurse Autoremediation systemTrafficshiftIn Global traffic automation systemAvailability Storing and tracking Linkedin availability data

Dont use as SoT store as Espresso is our primary key-value store


Couchbase Ecosystem17The LinkedIn Story

18Developing around CouchbaseJava li-couchbase-clientWrapper around standard Java Couchbase ClientCustom metrics emissionUsing Spring interfaceStoring data as Java serialized objects

Python couchbase-python-client

19Operational ToolingIn order to efficiently use Couchbase as SREs, we need the following:ProvisioningInstallationMonitoring & AlertingInfrastructure Visibility

Provisioning20Operational ToolingProvisioning FlowSeek estimated usage statistics for clusterSize of data to be storedQPSRedundancy NeedsCalculate cluster sizingCurrently done with a templateCouchbase has a simple calculator available online: hardware for cluster(s)

Installation21Operational ToolingProcessEnter cluster metadata into our management system (Range)Use Salt States to install and configure clusterSee Issa Fattahs post for more information:

BenefitsAbility to perform state enforcementUsing Salt Pillars to encrypt cluster/ bucket passwords end-to-end

Monitoring & Alerting22Operational ToolingWe run a daemon on each Couchbase Server that collects metrics every minute via Couchbase APIsUse cluster metadata from range to build dashboards with our own system InGraphs

See: Monitoring production deployments: 4pm - Great America 1

Monitoring & Alerting23Operational Tooling

Management24Operational ToolingWe want to see a world-view of all the clusters we run

Having bucket cluster/server level statistics is useful

Having a global view of who owns and operates each cluster/ bucket is useful

Management25Operational Tooling

26ConclusionsCouchbase was a natural fit into our existing infrastructure

Building an ecosystem around Couchbase was important to us and has helped Couchbase be successful at LinkedIn

Expanding use of CouchbaseIn the past year weve grown the number of buckets over 50%Starting to use Views in productionMoving Couchbase into LinkedIn standard deployment infrastructure

27Thank YouQuestions?

2014 LinkedIn Corporation. All Rights Reserved.2014 LinkedIn Corporation. All Rights Reserved.