evolving to operate at scale: a mind map at paypal – couchbase connect 2016

23
Evolving to operate at scale: A Mind Map: ©2016 PayPal Inc. Confidential and proprietary. John Kanagaraj, Sr. Member of Technical Staff, PayPal Core Data Platform 1

Upload: couchbase

Post on 15-Feb-2017

129 views

Category:

Software


0 download

TRANSCRIPT

Evolving to operate at scale: A Mind Map:

©2016 PayPal Inc. Confidential and proprietary.

John Kanagaraj, Sr. Member of Technical Staff, PayPal Core Data Platform

1

©2015 PayPal Inc. Confidential and proprietary. 2

AGENDA • Setting the Stage

• The Mind Map: An overview

• Diving into the Details

• Q & A

Speaker QualificationsCurrently Sr. Database/Data Architect @ PayPalHas been working with Oracle Databases and UNIX for 28 yearsWorking on various NoSQL technologies for the past 3 yearsHas worked on many Sharded applications – Both Oracle and NoSQLAuthor, Technical editor, Oracle ACE, Frequent speaker Loves to mentor new speakers and authors!http://www.linkedin.com/in/johnkanagaraj

4

Audience Survey

• Using Couchbase in Production?• Using other NoSQL technologies in Production?• Considering Couchbase (or other NoSQL)?• In the process of (or already in) the Cloud?

©2016 PayPal Inc. Confidential and proprietary. 5

Couchbase Landscape at PayPal• PayPal uses a Polyglot of Database technologies• Oracle RDBMS • NoSQL – Couchbase, Cassandra, Aerospike, MongoDB, …

• Couchbase usage @ PayPal• One of the earliest NoSQL’s adopted (2013)• Used primarily for low latency caching, cookie store, temporary token store,

etc. • Mostly Couchbase 4.1.x, planned to move to 4.5.x+ next year• Ten Couchbase cluster families, 250+ servers• See prior Couchbase connect presentations here , here and here

• Database team organized by Architecture, Engineering, Delivery and Operations across multiple geographies, supporting polyglot technologies

6©2016 PayPal Inc. Confidential and proprietary.

Challenges at Scale• Pushing the limits• Connections • Memory • Interconnect• CPU• DDL on busy tables• RAC reconfiguration• Redo rate• I/O latencies• SAN Storage limits• Replication latencies

• Solutions• Read Scale out

(replication)• Microservices• Connection multiplexing• Custom caching• Sharding• ..... And now .....• Active-Active Scale out

using NoSQL

©2016 PayPal Inc. Confidential and proprietary. 7

Operationalizing NoSQL: Mind Map – A high level view

A mind map is a hierarchical diagram used to visually organize information by showing relationships among pieces of the whole… Major ideas are connected directly to the central concept, and other ideas branch out from those.

©2016 PayPal Inc. Confidential and proprietary. 8

The Mind Map – The Whole View!• The “Whole View” has a lot of detail!• Organized around five themes:

1. Drivers – What’s driving this change?

2. Proof of Concept – Look before you leap!

3. Pre-production – Getting set before you start (if you get time!)

4. Production – Organizing your operations

5. Tackling the “Known Unknowns” (and the “unknown unknowns”!)

• This is based on my journey – It could be different for you and your organization

• Couchbase is easy to install and use • But introducing and operationalizing a

new technology in a large and dynamic org is challenging!

©2016 PayPal Inc. Confidential and proprietary. 9

Decisions, Decisions - (RDBMS vs NoSQL): a 4 x 4 matrixChallenges of Traditional RDBMS1. ACID overheads inhibit scalability2. Lack of native sharding3. Need for “schema before write”4. Higher base cost for setup and

scale upAdvantages of NoSQL1. Highly (and quickly) scalable at

lower cost2. Low latency, scalable K-V

read/write3. Flexible data model *1

4. Open source and Enterprise model

Advantages of Traditional RDBMS1. ACID is essential in many cases!2. Complex data model support3. Mature technology and ecosystem4. Wider skill availabilityChallenges of NoSQL1. Limited ACID and Transactions2. CAP Theorem is real! *2

3. Technology evolving, maturing but fragmented landscape

4. Skillsets not widely available (yet!)*1 – Establishing an initial data model is easy, evolving it is harder

*2 – This caveat applies to all distributed databases, whether RDBMS or NoSQL. Out of box though, NoSQL databases are distributed

©2016 PayPal Inc. Confidential and proprietary. 10

Drivers• All organizations are

perpetually challenged on multiple fronts

• Many industries are being disrupted (esp. by Silicon Valley!)

• Cost is always a challenge• New ways of working – Cloud,

DevOps, Agile programming, etc.

• Many options and new technologies are available

• Easy to adopt, but challenging to operate in large, diverse orgs

• Reality: Need to maintain the old, while building the new (polyglot persistence)

• Need management direction, understanding and support

• Each technology has its +/-• Is there a single solution?

©2016 PayPal Inc. Confidential and proprietary. 11

Look before you leap – Conducting a PoC• Do you have the time and

resources for a PoC?• Options available – 3rd party

managed. List here• Internal run – Opportunity to

learn• Vendor support – Choose

wisely!• Choose the right use case

• Simple to develop/deploy• Sufficiently important to

org• Cloud based deployments

• Easy to use – In built templates for compute

• Option to “fold in your cards” and walk away

• Use Caution esp. when deciding on performance tests

• Look for learnings, not hard numbers

©2016 PayPal Inc. Confidential and proprietary. 12

Before you begin – You wanna cloud?• Cloud vendors are a plenty, but

top three are well known• Amazon Web Services• Microsoft Azure• Google Cloud Platform

• Represents a new way of working• Provides “pay as you go” model

for• IaaS – Infrastructure• PaaS – Platform• SaaS - Software

• Vendor provided “Managed DB” services• Fully offload DB services and

operations• Multi-region, Multi-Availability

zones• Set a cadence for migration• Applications will need refactoring

for cloud based deployments

©2016 PayPal Inc. Confidential and proprietary. 13

Couchbase to the rescue!• Ease of use• Couchbase is easy to learn, setup and use• Scaleout is simple – Just add servers• Many use cases will not need full Transactions/ACID• Single node failure is much less impactful• Couchbase interface easier with N1QL and Full document support

• Support• Mature Enterprise Support with Couchbase• Large, growing user base both in the US and Internationally• Healthy community support

• Deployment• Multi datacenter, distributed processing and locality via XDCR• Journey to the cloud is much easier (with a wide variety of choices)

©2016 PayPal Inc. Confidential and proprietary. 14

On Premises - Hard (and Soft) Choices• On-prem gives you control• Data privacy regulations

may restrict you to on-prem services

• In-house clouds – OpenStack, VMWare, Oracle, etc.

• Choosing your SKU• Based on technology• VM or BM (Bare Metal)• What is the driving

vector?• Dev/QA vs Prod

• Containerization options available

• Procurement:• One-off (longer lead

times, but more optimized, buy-as-you-go)

• “Rack and Roll” – Fixed, pre-purchased racks, buy-ahead, optimized for time-to-market

©2016 PayPal Inc. Confidential and proprietary. 15

PPT : The Three-legged stool of ITPeople, Process, Technology

• People• Recruit or Retrain• Organize and

Lead• Motivate and

Retain• Manage the

culture

• Process• Recognize the need• Create or Adapt• Clear Roles and

Responsibilities• Measure, report and adjust

• Technology• Making the right choice• Manage the product

lifecycle• Adapt to changes• Cost management

©2016 PayPal Inc. Confidential and proprietary. 16

Training and Learning

• Formal/Informal Training• Instructor led from Couchbase (Virtual and Classroom) • Self paced from Couchbase.com• Onsite • Various paths (Dev/Mobile Dev/DBA)• Conferences and Meetups

• Learn from other’s experience• Use cases organized around many vectors • Couchbase Blogs (a great source of information)

• Getting Started – a Roadmap

Training is optional, (Ongoing) Learning is Essential!

©2016 PayPal Inc. Confidential and proprietary. 17

People - Your most important (and very precious) resource!

• Hiring (and retaining) the right set of people is key to your success!

• Training is formal, but is just the start – Learning is perpetual

• Develop standards and evolve them• Design: Making the right

choices• Developers: Choice of

language, Client side interaction frameworks (Spring)

• Operation – Whole bunch of SOP’s

• Review use cases, designs and code with your vendor SA’s

©2016 PayPal Inc. Confidential and proprietary. 18

Running a Production environment – Part 1• Monitoring and Alerting

needs to be solid from Day 1 !!!

• Use out-of-box alerting as much as possible

• May need integration to Enterprise ”single pane of glass”

• Error/INFO logging on both Server and Client – Key to troubleshooting

• Capacity modeling• Initial and Ongoing• Knowing which metric

to record and trend is key

• Metrics analysis• “W-o-W” and “Y-o-Y”• Map to business

metrics• Review capacity

periodically

©2016 PayPal Inc. Confidential and proprietary. 19

Running a Production environment – Part 2• Secure from start • Access logging is key • RBAC still maturing in

NoSQL• Admin vs. Dev: Don’t let

Production become Wild West

• Change management: necessary friction

• Production operations• Layer your support• 24x7 for L1, Oncall for

L2/L3• Automation is key• Create Standard

Operating Procedures (SOPs)

• Understand and learn from failures

• Backup but also test Restore

• Change Data Capture/ETL is hard!

• Data Management and Governance

©2016 PayPal Inc. Confidential and proprietary. 20

Running a Production environment – Part 3• Establish, measure and

report SLAs – Performance and Failure

• Fleet management and Infrastructure life cycle management is necessary

• Data Center Strategy – Active/Active or DR only?

• Vendor and Technology management• License management• Vendor TAM, PM and SM

relationships• Quarterly Business

Reviews• Help create technology

roadmaps and features• Community participation

• Stronger Together!• Learn from each other• Participate and

contribute

©2016 PayPal Inc. Confidential and proprietary. 21

Operationalizing Couchbase

• Alerting and Monitoring• Use Out of Box alerting for important alerts and Admin Console for most metrics• Command line and REST API access to metrics• Custom alerts via scripts – Connection count, Disk Queue, CPU, Active Resident

Ratio, etc.• Other monitoring frameworks (e.g. Nagios)• Consider integration to your org’s SPOG (Single-Pane-Of-Glass) for monitoring

• Capacity management• Understanding, recording and reporting on various capacity vectors• Periodic review and capacity additions

• Security – Build in from the start• Production operations – Virtuous loop of monitoring, responding and

fixing• Involve Couchbase resources (requires support contract)

Making Couchbase work

©2016 PayPal Inc. Confidential and proprietary. 22

The ”Known Unknowns” – Know the hidden enemy

• There are many unknowns• ”Complex systems fail in complex

ways” – New ways of failing• Embrace concept of “Fail Fast”• Giving up Data Consistency is hard –

CAP Theorem is real• Data management responsibilities

shift• DevOps culture needs to be adopted• New Concept: “Treat Infrastructure

as Code” (E.g. DCOS, Chef/Puppet/Ansible Automation, Scripts in GitHub)

• Getting rid of legacy is very hard, and will stick around forever in hidden ways

• Teams WILL get silo’ed if you don’t recognize and “spread the wealth”

©2016 PayPal Inc. Confidential and proprietary. 23

Putting it all together• Couchbase is Easy – But introducing and operationalizing NoSQL can

be difficult• Requires new way of thinking and working• Hopefully, this Mind Map is a framework that may be useful• “Forewarned is Forearmed”• Q & A and Wrap up• Connect with me on LinkedIn - https://

www.linkedin.com/in/johnkanagaraj