david cramer: building to scale

Post on 17-May-2015

1.689 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BUILDING TO SCALE

David Cramertwitter.com/zeeg

Tuesday, February 26, 13

The things we build will notand can not last

Tuesday, February 26, 13

Who am I?

Tuesday, February 26, 13

Tuesday, February 26, 13

Tuesday, February 26, 13

Tuesday, February 26, 13

What do we mean by scale?

Tuesday, February 26, 13

DISQUSMassive traffic with a long tail

SentryCounters and event aggregation

tenXerMore stats than we can count

Tuesday, February 26, 13

Does one size fit all?

Tuesday, February 26, 13

Practical Storage

Tuesday, February 26, 13

Postgres is the foundation of DISQUS

Tuesday, February 26, 13

MySQL powers the tenXer graph store

Tuesday, February 26, 13

Sentry is built on SQL

Tuesday, February 26, 13

Databases are not the problem

Tuesday, February 26, 13

Compromise

Tuesday, February 26, 13

Scaling is about Predictability

Tuesday, February 26, 13

Augment SQL with [technology]

Tuesday, February 26, 13

Tuesday, February 26, 13

Simple solutions using Redis(I like Redis)

Tuesday, February 26, 13

Counters

Tuesday, February 26, 13

Counters are everywhere

Tuesday, February 26, 13

Counters in SQLUPDATE table SET counter = counter + 1;

Tuesday, February 26, 13

Counters in RedisINCR counter 1

>>> redis.incr('counter')

Tuesday, February 26, 13

Counters in Sentry

event ID 1 event ID 2 event ID 3

Redis INCR Redis INCR Redis INCR

SQL Update

Tuesday, February 26, 13

Counters in Sentry

‣ INCR event_id in Redis‣ Queue buffer incr task

‣ 5 - 10s explicit delay

‣ Task does atomic GET event_id and DEL event_id (Redis pipeline)

‣ No-op If GET is not > 0‣ One SQL UPDATE per unique event per

delay

Tuesday, February 26, 13

Counters in Sentry (cont.)

Pros‣ Solves database row lock contention‣ Redis nodes are horizontally scalable‣ Easy to implement

Cons‣ Too many dummy (no-op) tasks

Tuesday, February 26, 13

Alternative Counters

event ID 1 event ID 2 event ID 3

Redis ZINCRBY Redis ZINCRBY Redis ZINCRBY

SQL Update

Tuesday, February 26, 13

Sorted Sets in Redis

> ZINCRBY events ad93a 1{ad93a: 1}

> ZINCRBY events ad93a 1{ad93a: 2}

> ZINCRBY events d2ow3 1{ad93a: 2, d2ow3: 1}

Tuesday, February 26, 13

Alternative Counters

‣ ZINCRBY events event_id in Redis‣ Cron buffer flush

‣ ZRANGE events to get pending updates‣ Fire individual task per update

‣ Atomic ZSCORE events event_id and ZREM events event_id to get and flush count.

Tuesday, February 26, 13

Alternative Counters (cont.)

Pros‣ Removes (most) no-op tasks‣ Works without a complex queue due to no

required delay on jobs

Cons‣ Single Redis key stores all pending updates

Tuesday, February 26, 13

Activity Streams

Tuesday, February 26, 13

Streams are everywhere

Tuesday, February 26, 13

Streams in SQL

class Activity: SET_RESOLVED = 1 SET_REGRESSION = 6

TYPE = ( (SET_RESOLVED, 'set_resolved'), (SET_REGRESSION, 'set_regression'), )

event = ForeignKey(Event) type = IntegerField(choices=TYPE) user = ForeignKey(User, null=True) datetime = DateTimeField() data = JSONField(null=True)

Tuesday, February 26, 13

Streams in SQL (cont.)

>>> Activity(event, SET_RESOLVED, user, now)"David marked this event as resolved."

>>> Activity(event, SET_REGRESSION, datetime=now)"The system marked this event as a regression."

>>> Activity(type=DEPLOY_START, datetime=now)"A deploy started."

>>> Activity(type=SET_RESOLVED, datetime=now)"All events were marked as resolved"

Tuesday, February 26, 13

Stream == View == Cache

Tuesday, February 26, 13

Views as a Cache

TIMELINE = []MAX = 500

def on_event_creation(event): global TIMELINE

TIMELINE.insert(0, event) TIMELINE = TIMELINE[:MAX]

def get_latest_events(num=100): return TIMELINE[:num]

Tuesday, February 26, 13

Views in Redis

class Timeline(object): def __init__(self): self.db = Redis()

def add(self, event): score = float(event.date.strftime('%s.%m')) self.db.zadd('timeline', event.id, score) def list(self, offset=0, limit=-1): return self.db.zrevrange( 'timeline', offset, limit)

Tuesday, February 26, 13

Views in Redis (cont.)

MAX_SIZE = 10000

def add(self, event): score = float(event.date.strftime('%s.%m'))

# increment the key and trim the data to avoid # data bloat in a single key with self.db.pipeline() as pipe: pipe.zadd(self.key, event.id, score) pipe.zremrange(self.key, event.id, MAX_SIZE, -1)

Tuesday, February 26, 13

Queuing

Tuesday, February 26, 13

Introducing Celery

Tuesday, February 26, 13

RabbitMQ or Redis

Tuesday, February 26, 13

Asynchronous Tasks

# Register the task@task(exchange=”event_creation”)def on_event_creation(event_id): counter.incr('events', event_id)

# Delay executionon_event_creation(event.id)

Tuesday, February 26, 13

Fanout

@task(exchange=”counters”)def incr_counter(key, id=None): counter.incr(key, id)

@task(exchange=”event_creation”)def on_event_creation(event_id): incr_counter.delay('events', event_id) incr_counter.delay('global')

# Delay executionon_event_creation(event.id)

Tuesday, February 26, 13

Object Caching

Tuesday, February 26, 13

Object Cache Prerequisites

‣ Your database can't handle the read-load

‣ Your data changes infrequently

‣ You can handle slightly worse performance

Tuesday, February 26, 13

Distributing Load with Memcache

Memcache 1 Memcache 2 Memcache 3

Event ID 01Event ID 04Event ID 07Event ID 10Event ID 13

Event ID 02Event ID 05Event ID 08Event ID 11Event ID 14

Event ID 03Event ID 06Event ID 09Event ID 12Event ID 15

Tuesday, February 26, 13

Querying the Object Cache

def make_key(model, id): return '{}:{}'.format(model.__name__, id)

def get_by_ids(model, id_list): model_name = model.__name__ keys = map(make_key, id_list)

res = cache.get_multi()

pending = set() for id, value in res.iteritems(): if value is None: pending.add(id)

if pending: mres = model.objects.in_bulk(pending)

cache.set_multi({make_key(o.id): o for o in mres})

res.update(mres)

return res

Tuesday, February 26, 13

Pushing State

def save(self): cache.set(make_key(type(self), self.id), self)

def delete(self): cache.delete(make_key(type(self), self.id)

Tuesday, February 26, 13

Redis for Persistence

Redis 1 Redis 2 Redis 3

Event ID 01Event ID 04Event ID 07Event ID 10Event ID 13

Event ID 02Event ID 05Event ID 08Event ID 11Event ID 14

Event ID 03Event ID 06Event ID 09Event ID 12Event ID 15

Tuesday, February 26, 13

Routing with Nydus

# create a cluster of Redis connections which# partition reads/writes by (hash(key) % size)

from nydus.db import create_cluster

redis = create_cluster({ 'engine': 'nydus.db.backends.redis.Redis', 'router': 'nydus.db...redis.PartitionRouter', 'hosts': { {0: {'db': 0} for n in xrange(10)}, }})

github.com/disqus/nydus

Tuesday, February 26, 13

Planning for the Future

Tuesday, February 26, 13

One of the largest problems for Disqus is network-wide moderation

Tuesday, February 26, 13

Be Mindful of Features

Tuesday, February 26, 13

Sentry's Team Dashboard

‣ Data limited to a single team‣ Simple views which could be materialized‣ Only entry point for "data for team"

Tuesday, February 26, 13

Sentry's Stream View

‣ Data limited to a single project‣ Each project could map to a different DB

Tuesday, February 26, 13

Preallocate Shards

Tuesday, February 26, 13

DB5 DB6 DB7 DB8 DB9

DB0 DB1 DB2 DB3 DB4

redis-1

Tuesday, February 26, 13

redis-2

DB5 DB6 DB7 DB8 DB9

DB0 DB1 DB2 DB3 DB4

redis-1

When a physical machine becomes overloaded migrate a chunk of shards

to another machine.

Tuesday, February 26, 13

Takeaways

Tuesday, February 26, 13

Enhance your databaseDon't replace it

Tuesday, February 26, 13

Queue Everything

Tuesday, February 26, 13

Learn to say no(to features)

Tuesday, February 26, 13

Complex problems do not require complex solutions

Tuesday, February 26, 13

QUESTIONS?

Tuesday, February 26, 13

top related