building reliable cloud storage with riak and cloudstack - andy gross, chief architect (basho)

38
Riak and Riak CS Riak and Riak CS Andy Gross <@argv0> Andy Gross <@argv0> Chief Architect, Basho Technologies Chief Architect, Basho Technologies Silicon Valley Cloud Computing Group Silicon Valley Cloud Computing Group April 2, 2013 April 2, 2013

Upload: buildacloud

Post on 08-May-2015

1.851 views

Category:

Technology


0 download

DESCRIPTION

About Basho: Basho makes and distributes Riak CS. Built on Riak, Basho's opensource, scalable datastore used by thousands in production, CS is made for companies that need large file storage that can't go down. About the speaker: Andy Gross, Basho's Chief Architect, will take you on a tour of RiakCS, talk about how and why Basho built it, and the architecture that underpins it. He'll also highlight various uses case featuring Fortune500 companies who rely on Riak CS.

TRANSCRIPT

Page 1: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Riak and Riak CSRiak and Riak CSAndy Gross <@argv0>Andy Gross <@argv0>

Chief Architect, Basho TechnologiesChief Architect, Basho Technologies

Silicon Valley Cloud Computing GroupSilicon Valley Cloud Computing Group

April 2, 2013April 2, 2013

Page 2: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

BashoBasho120+ employees, offices in SF, MA, 120+ employees, offices in SF, MA, London, JapanLondon, Japan

Founded in 2008, open sourced Riak in Founded in 2008, open sourced Riak in 20092009

Sponsors of the Riak open source database Sponsors of the Riak open source database (Apache 2)(Apache 2)

Sell Enterprise features (multi-DC Sell Enterprise features (multi-DC replication), support, training.replication), support, training.

Riak CS (S3-compat storage) released in Riak CS (S3-compat storage) released in March 2012March 2012

Page 3: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Now Open Source (Apache 2)Now Open Source (Apache 2)

Cloud storage software backed by RiakCloud storage software backed by Riak

S3 APIS3 API

Formerly closed-sourceFormerly closed-source

Per-tenant reportingPer-tenant reporting

Pluggable authenticationPluggable authentication

Detailed statsDetailed stats

DTrace supportDTrace support

Multi-datacenter replication (Enterprise)Multi-datacenter replication (Enterprise)

Preliminary integration with CloudStackPreliminary integration with CloudStack

Page 4: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

REDACTEDREDACTEDREDACTEDREDACTED

REDACTEDREDACTED

Page 5: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

what is a cloud what is a cloud service?service?

operationally simpleoperationally simple

horizontally scalablehorizontally scalable

globally distributedglobally distributed

highly availablehighly available

no SPOFsno SPOFs

fault tolerantfault tolerant

Page 6: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

you can’t outsource you can’t outsource these propertiesthese properties

operationally simpleoperationally simple

horizontally scalablehorizontally scalable

globally distributedglobally distributed

highly availablehighly available

no SPOFsno SPOFs

fault tolerantfault tolerant

Page 7: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

““use pacemaker” = use pacemaker” = wrong answerwrong answer

Page 8: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

““use mysql best use mysql best practices for practices for redundancy” = wrong redundancy” = wrong answeranswer

Page 9: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

““just plug it into a just plug it into a SAN” = wrong SAN” = wrong answeranswer

Page 10: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

all cloud services all cloud services need reliable, need reliable, distributed state distributed state storagestorage

Page 11: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

storage is the most storage is the most important and important and hardest parthardest part

Page 12: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Riak CS uses RiakRiak CS uses Riak

Page 13: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

What is Riak?What is Riak?

Page 14: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Key-Value store (plus extras)Key-Value store (plus extras)

Distributed, horizontally scalableDistributed, horizontally scalable

Eventually consistentEventually consistent

Fault-tolerantFault-tolerant

Highly-availableHighly-available

Inspired by Amazon’s DynamoInspired by Amazon’s Dynamo

Page 15: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Simple operations - get, put, deleteSimple operations - get, put, delete

Value is mostly opaque (some metadata)Value is mostly opaque (some metadata)

ExtrasExtras

MapReduceMapReduce

Secondary IndexesSecondary Indexes

Full-text search (optional)Full-text search (optional)

Key-ValueKey-Value

Page 16: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Distributed & Distributed & Horizontally ScalableHorizontally Scalable

Default configuration is in a clusterDefault configuration is in a cluster

Load and data are spread evenly via consistent Load and data are spread evenly via consistent hashinghashing

Scalable: Add more nodes to get more XScalable: Add more nodes to get more X

Page 17: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Fault-TolerantFault-Tolerant

Symmetry: All nodes participate equallySymmetry: All nodes participate equally

Decentralized: no central control, no SPOFDecentralized: no central control, no SPOF

All data is replicated 3x by defaultAll data is replicated 3x by default

Cluster transparently survives...Cluster transparently survives...

node failurenode failure

network partitionsnetwork partitions

Built on Erlang/OTP (designed for FT)Built on Erlang/OTP (designed for FT)

Page 18: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Highly-AvailableHighly-Available

Any node can serve client requestsAny node can serve client requests

Fallbacks (sloppy quorums) are used when Fallbacks (sloppy quorums) are used when nodes are downnodes are down

Always accepts write requests Always accepts write requests

Accepts read request as long as R/N nodes Accepts read request as long as R/N nodes are alive are alive

Per-request quorumsPer-request quorums

Page 19: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Inspired by Amazon’s Inspired by Amazon’s DynamoDynamo

Masterless, peer-coordinated replicationMasterless, peer-coordinated replication

Consistent hashingConsistent hashing

Eventually consistentEventually consistent

Quorum reads and writesQuorum reads and writes

Anti-entropy: read repair, hinted handoffAnti-entropy: read repair, hinted handoff

Page 20: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

RiakNode

RiakNode

RiakNode

RiakNode

RiakNode

Large Object

Riak CS

S3API

ReportingAPI

Riak CS

S3API

ReportingAPI

Riak CS

S3API

ReportingAPI

Riak CS

S3API

ReportingAPI

Riak CS

S3API

ReportingAPI

1. user uploads an object

1 MB

2. Riak CSbreaks object

into 1 MB chunks

1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB

3. Riak CSstreams chunksto Riak nodes

4. Riak replicatesand stores

chunks

Page 21: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

PrinciplesPrinciples

Always-writable Always-writable

Incrementally scalableIncrementally scalable

SymmetricalSymmetrical

DecentralizedDecentralized

Focus on SLAs, tail latencyFocus on SLAs, tail latency

Page 22: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

TechniquesTechniques

Consistent HashingConsistent Hashing

Vector ClocksVector Clocks

Read RepairRead Repair

Anti-EntropyAnti-Entropy

Hinted HandoffHinted Handoff

Gossip ProtocolGossip Protocol

Page 23: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Consistent HashingConsistent Hashing

Invented by Danny Lewin and others @ Invented by Danny Lewin and others @ MIT/AkamaiMIT/Akamai

Minimizes remapping of keys when number of Minimizes remapping of keys when number of hash slots changeshash slots changes

Originally applied to CDNs, used in Dynamo for Originally applied to CDNs, used in Dynamo for replica placementreplica placement

Enables incremental scalability, even spreadEnables incremental scalability, even spread

Minimizes hot spotsMinimizes hot spots

Page 24: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)
Page 25: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Vector ClocksVector Clocks

Introduced by Mattern et al, in 1988Introduced by Mattern et al, in 1988

Extends Lamport’s timestamps (1978)Extends Lamport’s timestamps (1978)

Each value in Dynamo tagged with vector clockEach value in Dynamo tagged with vector clock

Allows detection of stale values, logical siblingsAllows detection of stale values, logical siblings

Page 26: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Read RepairRead Repair

Update stale versions opportunistically on Update stale versions opportunistically on reads (instead of writes)reads (instead of writes)

Pushes system toward consistency, after Pushes system toward consistency, after returning value to clientreturning value to client

Reflects focus on a cheap, always-available Reflects focus on a cheap, always-available write pathwrite path

Page 27: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Hinted HandoffHinted Handoff

Any node can accept writes for other nodes if Any node can accept writes for other nodes if they’re downthey’re down

All messages include a destinationAll messages include a destination

Data accepted by node other than destination Data accepted by node other than destination is handed off when node recoversis handed off when node recovers

As long as a single node is alive the cluster can As long as a single node is alive the cluster can accept a writeaccept a write

Page 28: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Anti-EntropyAnti-Entropy

Replicas maintain a Merkle Tree of keys and Replicas maintain a Merkle Tree of keys and their versions/hashestheir versions/hashes

Trees periodically exchanged with peer vnodesTrees periodically exchanged with peer vnodes

Merkle tree enables cheap comparisonMerkle tree enables cheap comparison

Only values with different hashes are Only values with different hashes are exchangedexchanged

Pushes system toward consistencyPushes system toward consistency

Page 29: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Gossip ProtocolGossip Protocol

Decentralized approach to managing global Decentralized approach to managing global statestate

Trades off atomicity of state changes for a Trades off atomicity of state changes for a decentralized approachdecentralized approach

Volume of gossip can overwhelm networks Volume of gossip can overwhelm networks without carewithout care

Page 30: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Hinted Handoff•Node fails

• Requests go to fallback

•Node comes back

• “Handoff” - data returns to recovered node

•Normal operations resume

hash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)

``̀

X

X

XX

X

X

XX

`̀`

Page 31: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Anatomy of a Request

get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)

Get Handler (FSM)Get Handler (FSM)

clientRiak

hash(“hash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)”)

== 10, 11, 12== 10, 11, 12

get(“blocks/6307C89A-710A-42CD-9FFB-

2A6B39F983EA”)Coordinating node

Cluster

66 77 88 99 1010 1111 1212 1313 1414 1515 1616

The Ring

R=2R=2

v1v1 v2v2

v1v1 v2v2

v2v2

Page 32: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

v2v2v2v2

Read Repairget(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)

Get Handler (FSM)Get Handler (FSM)

clientRiak

Coordinating nodeCluster

66 77 88 99 1010 1111 1212 1313 1414 1515 1616

R=2R=2 v1v1 v2v2

v2v2

v1v1

v2v2v1v1v1v1 v2v2v2v2

Page 33: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Erlang/OTP RuntimeErlang/OTP Runtime

Riak KVRiak KV

Riak ArchitectureClient APIsClient APIs

Request CoordinationRequest Coordination

Riak CoreRiak Core

getget putput deletdeletee

map-map-reducereduce

HTTPHTTP Protocol BuffersProtocol Buffers

Erlang local clientErlang local client

membershipconsistent hashinghandoff

node-liveness

gossip

buckets

vnodesvnodes

storage backendstorage backend

JS RuntimeJS Runtime

vnode mastervnode master

Page 34: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

riak is a solid riak is a solid foundation for foundation for building cloud building cloud servicesservices

Page 35: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Coming Soon:Coming Soon:Riak CS 1.4 (Q2)Riak CS 1.4 (Q2)

Swift APISwift API

Keystone IntegrationKeystone Integration

S3 FeaturesS3 Features

COPY ObjectCOPY Object

Object VersioningObject Versioning

Riak CS 1.5 (Q3)Riak CS 1.5 (Q3)

Server side encryptionServer side encryption

More S3 featuresMore S3 features

Enhanced CloudStack and OpenStack integrationEnhanced CloudStack and OpenStack integration

RiakRiak

Page 36: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

Coming Later (2014)Coming Later (2014)

Erasure codingErasure coding

Reduced redundancy storageReduced redundancy storage

Native indexing/searchNative indexing/search

Page 37: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

RICON East - May 13-14, RICON East - May 13-14, NYCNYC

A distributed systems conference for A distributed systems conference for developersdevelopers

Speakers from Comcast, State Farm, UC Speakers from Comcast, State Farm, UC Berkeley, Harvard, and many moreBerkeley, Harvard, and many more

Use discount code SVCloud20 for 20% off Use discount code SVCloud20 for 20% off ticketstickets

http://ricon.io/east.htmlhttp://ricon.io/east.html

Page 38: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

thanks!/questions?thanks!/questions?download riakcs: download riakcs:

http://docs.basho.com/riakcs/latest/riakcs-downloads/ hack riakcs:hack riakcs:

http://github.com/basho/riak_cs

work at basho:work at basho:http://bashojobs.theresumator.comhttp://bashojobs.theresumator.com

follow basho on twitter:follow basho on twitter: http:/twitter.com/bashohttp:/twitter.com/basho