practical celery

84
PRACTICAL CELERY

Upload: cameron-maske

Post on 04-Jul-2015

694 views

Category:

Technology


5 download

DESCRIPTION

Learn the ins and outs of running background tasks with the popular python module Celery. We'll hit the ground running. With everything you need to know to run your first task, to scaling your stack to run millions each day.

TRANSCRIPT

Page 1: Practical Celery

PRACTICALCELERY

Page 2: Practical Celery

CAMERON MASKEtwitter: @cameronmaske

email: [email protected]

web: http://cameronmaske.com

Page 3: Practical Celery

WHAT WE'LLCOVER...

Page 4: Practical Celery

WHAT IS CELERY?HOW DOES IT WORK?

Page 5: Practical Celery

USING CELERY, BESTPRACTICES AND SCALING.

Page 6: Practical Celery

SURVEY

Page 7: Practical Celery

CELERYASYNCHRONOUS

DISTRIBUTEDTASK QUEUE

Page 8: Practical Celery

OUT OF THEREQUEST/RESPONSE

CYCLE.Example: Sending emails asynchronously.

Page 9: Practical Celery

TASKS IN THEBACKGROUND.

Example: Computational heavy jobs.Example: Interacting with external APIs.

Page 10: Practical Celery

PERIODIC JOBS.

Page 11: Practical Celery

HISTORYPython.Released (0.1) in 2009.Currently on 3.1, with 3.2 in alpha.Developed by Ask Solem (@asksol)

Page 12: Practical Celery

ARCHITECTURE

Page 13: Practical Celery

PRODUCERProduces a task for the queue.

Page 14: Practical Celery

BROKERStores the task backlogAnswers, what work remains to be done?RabbitMQ, Redis, SQLAlchemy, Django's ORM, MongoDB...

Page 15: Practical Celery

WORKERExecute and consumes tasks.Distributed.

Page 16: Practical Celery

RESULTS BACKEND.Stores the results from our tasks.Redis, Redis, SQLAlchemy, Django's ORM, MongoDB...Optional!

Page 17: Practical Celery

EXAMPLE

Page 18: Practical Celery

from celery import Celery

app = Celery('tasks', backend='amqp', broker='amqp://guest@localhost//')

@app.taskdef add(x, y): return x + y

Page 19: Practical Celery

>>> result = add.delay(4, 4)>>> result.state'SUCCESS'>>> result.id'4cc7438e-afd4-4f8f-a2f3-f46567e7ca77'>>> result.get()8

http://celery.readthedocs.org/en/latest/reference/celery.result.html

Page 20: Practical Celery

PICK YOUR [email protected] add(x, y): return x + y

add(2, 4)

class AddTask(app.Task): def run(self, x, y): return x + y

AddTask().run(2, 4)

Page 21: Practical Celery

# Asyncadd.delay(2, 4)add.apply_aync(args=(2, 4), expires=30)# Eager!result = add.apply(args=(2, 4)) # Executes locally.# Or...add(2, 4) # Does not return a celery result!

Page 22: Practical Celery

INTERGRATING WITHDJANGO.

Page 23: Practical Celery

BEWARE OF DJANGO-CELERY.

Page 24: Practical Celery

http://docs.celeryproject.org/en/master/django/first-steps-with-django.html

- project/ - config/__init__.py - config/settings.py - config/urls.py- manage.py

Page 25: Practical Celery

# project/config/celery.py

from __future__ import absolute_import

import os

from celery import Celery

from django.conf import settings

# Set the default Django settings module for the 'celery' program.os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'config.settings')

app = Celery('app')

# Using a string here means the worker will not have to# pickle the object when using Windows.app.config_from_object('django.conf:settings')app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

@app.task(bind=True)def debug_task(self): print('Request: {0!r}'.format(self.request))

Page 26: Practical Celery

# project/config/__init__.pyfrom __future__ import absolute_import

# This will make sure the app is always imported when# Django starts so that shared_task will use this app.from .celery import app as celery_app

__all__ = ['celery_app']

Page 27: Practical Celery

celery -A project worker -l info

Page 28: Practical Celery

TESTING# settings.pyimport sysif 'test' in sys.argv: CELERY_EAGER_PROPAGATES_EXCEPTIONS=True, CELERY_ALWAYS_EAGER=True, BROKER_BACKEND='memory'

Page 29: Practical Celery

PATTERNSAND BEST

PRACTICES.

Page 30: Practical Celery

NEVER PASS OBJECTS ASARGUMENTS.

Page 31: Practical Celery

# [email protected]()def send_reminder(reminder): reminder.send_email()

# [email protected]()def send_reminder(pk): try: reminder = Reminder.objects.get(pk=pk) except Reminder.DoesNotExist: return reminder.send_email()

Page 32: Practical Celery

KEEP TASKS GRANUAL.CAN PROCESS MORE IN

PARALLEL.

Page 33: Practical Celery

AVOID LAUNCHINGSYNCHRONOUS

SUBTASKS

Page 34: Practical Celery

# [email protected] update_page_info(url): page = fetch_page.delay(url).get() info = parse_page.delay(url, page).get() store_page_info.delay(url, info)

@app.taskdef fetch_page(url): return myhttplib.get(url)

@app.taskdef parse_page(url, page): return myparser.parse_document(page)

@app.taskdef store_page_info(url, info): return PageInfo.objects.create(url, info)

Page 35: Practical Celery

# Gooddef update_page_info(url): chain = fetch_page.s() | parse_page.s() | store_page_info.s(url) chain()

@app.task()def fetch_page(url): return myhttplib.get(url)

@app.task()def parse_page(page): return myparser.parse_document(page)

@app.task(ignore_result=True)def store_page_info(info, url): PageInfo.objects.create(url=url, info=info)

http://celery.readthedocs.org/en/latest/userguide/canvas.html

Page 36: Practical Celery

PERIODIC TASKS.http://celery.readthedocs.org/en/latest/userguide/periodic-

tasks.html

Page 37: Practical Celery

from datetime import timedelta

@app.periodic_task(run_every=timedelta(minutes=5)):def run_every_five(): pass

Page 38: Practical Celery

from datetime import timedelta

class RunEveryFive(app.PeriodicTask): run_every = timedelta(minutes=5) def run(self): pass

Page 39: Practical Celery

from datetime import timedelta

@app.task():def run_every_five(): pass

CELERYBEAT_SCHEDULE = { 'run-every-five': { 'task': 'tasks.run_every_five', 'schedule': timedelta(seconds=30) }, }

Page 40: Practical Celery

CRON STYLE.from celery.schedules import crontab

crontab(minute=0, hour='*/3') # Every 3 hours.crontab(day_of_week='sunday') # Every minute on Sundays.crontab(0, 0, 0, month_of_year='*/3') # First month of every quarter.

Page 41: Practical Celery

@app.periodic_task(run_every=crontab(minute=0, hour=1))def schedule_emails(): user_ids = User.objects.values_list('id', flat=True) for user_id in user_ids: send_daily_email.delay(user_id)

@app.task()def send_daily_email(user_id): user = User.objects.get(id=user_id) try: today = datetime.now() Email.objects.get( user=user, date__year=today.year, date__month=today.month, date__day=today.day) except Email.DoesNotExist: email = Email(user=user, body="Hey, don't forget to LOGIN PLEASE!") email.send() email.save()

Page 42: Practical Celery

CELERY BEAT A.K.A THESCHEDULER.

celery -A project beat

Page 43: Practical Celery

NEVER RUN A BEAT +WORKER ON A SINGLE

CELERY PROCESS.# Really bad idea....celery -A project worker -B

Page 44: Practical Celery

FREQUENTLY RUNNINGPERIODIC TASKS.

BEWARE OF "TASK STACKING"

Page 45: Practical Celery

Schedule task runs every 5 minutes.Tasks take 30 minutes.Schedule task stacks.Bad stuff.

Page 46: Practical Celery

EXPIRES!from time import sleep

@app.periodic_task(expires=5*60, run_every=timedelta(minutes=5))def schedule_task(): for _ in range(30): one_minute_task.delay()

@app.task(expires=5*60)def one_minute_task(): sleep(60)

Page 47: Practical Celery

THINGS GO WRONG INTASKS!

Page 48: Practical Celery

RETRY

Page 49: Practical Celery

from celery.exceptions import Retry

@app.task(max_retries=10)def gather_data(): try: data = api.get_data() # etc, etc, ... except api.RateLimited as e: raise Retry(exc=e, when=e.cooldown) except api.IsDown: return

Page 50: Practical Celery

ERROR INSIGHT.

Page 51: Practical Celery

SENTRY.

Page 52: Practical Celery

STAGES

Page 53: Practical Celery

class DebugTask(app.Task): def after_return(self, status, retval, task_id, args, kwargs, einfo): print("I'm done!")

def on_failure(self, exc, task_id, args, kwargs, einfo): print("I failed :(")

def on_retry(self, exc, task_id, args, kwargs, einfo): print("I'll try again!")

def on_success(self, retval, task_id, args, kwargs): print("I did it!")

Page 54: Practical Celery

ABSTRACTclass AbstractTask(app.Task): abstract = True def after_return(self, *args, **kwargs): print("All done!")

@app.task(base=AbstractTask)def add(x, y): return x + y

Page 55: Practical Celery

INSTANTIATIONclass DatabaseTask(app.Task): abstract = True _db = None

@property def db(self): if self._db is None: self._db = Database.connect() return self._db

Page 56: Practical Celery

ENSURE A TASK ISEXECUTED ONE AT A TIME

Page 57: Practical Celery

from celery import taskfrom celery.utils.log import get_task_loggerfrom django.core.cache import cachefrom django.utils.hashcompat import md5_constructor as md5from djangofeeds.models import Feed

logger = get_task_logger(__name__)

LOCK_EXPIRE = 60 * 5 # Lock expires in 5 minutes

@taskdef import_feed(feed_url): # The cache key consists of the task name and the MD5 digest # of the feed URL. feed_url_digest = md5(feed_url).hexdigest() lock_id = '{0}-lock-{1}'.format(self.name, feed_url_hexdigest)

# cache.add fails if if the key already exists acquire_lock = lambda: cache.add(lock_id, 'true', LOCK_EXPIRE) # memcache delete is very slow, but we have to use it to take # advantage of using add() for atomic locking release_lock = lambda: cache.delete(lock_id)

logger.debug('Importing feed: %s', feed_url) if acquire_lock(): try: feed = Feed.objects.import_feed(feed_url) finally: release_lock() return feed.url

logger.debug( 'Feed %s is already being imported by another worker', feed_url)

Page 58: Practical Celery

IMPORTANT SETTINGS

Page 59: Practical Celery

# settings.pyCELERY_IGNORE_RESULT = TrueCELERYD_TASK_SOFT_TIME_LIMIT = 500CELERYD_TASK_TIME_LIMIT = 1000

Page 60: Practical Celery

# tasks.pyapp.task(ignore_result=True, soft_time_limit=60, time_limit=120)def add(x, y): pass

Page 61: Practical Celery

# settings.pyCELERYD_MAX_TASKS_PER_CHILD = 500CELERYD_PREFETCH_MULTIPLIER = 4

Page 62: Practical Celery

BROKER

Page 63: Practical Celery

SO MANYCHOICES!

RabbitMQRedisSQLAlchemyDjango's ORMMongoDBAmazon SQSCouchDBBeanstalkIronMQ

Page 64: Practical Celery

DJANGO ORM.# settings.pyBROKER_URL = 'django://'INSTALLED_APPS = ( 'kombu.transport.django',)CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend'

python manage.py syncdb

Page 65: Practical Celery

DON'T DO THIS FORANYTHING SERIOUS.

Page 66: Practical Celery

USE RABBITMQ

Page 67: Practical Celery

C OPTIMIZED LIBRARY$ pip install librabbitmq

Page 68: Practical Celery

WORKERS

Page 69: Practical Celery

CONCURRENCYcelery -A project worker -C 10celery -A project worker --autoscale=10,1

Page 70: Practical Celery

INCREASED CONCURRENCY CANQUICKLY DRAIN CONNECTIONS ON

YOUR DATABASEUse a connection pooler (pgbouncer).

Page 71: Practical Celery

ROUTING

Page 72: Practical Celery

CELERY_ROUTES = { 'email.tasks.send_mail': { 'queue': 'priority', },}

# orsend_mail.apply_async(queue="priority")

celery -A project worker -Q email

Page 73: Practical Celery

DEDICATED WORKERS.

Page 74: Practical Celery

BOTTLENECKS

Page 75: Practical Celery

IdentifyFixRepeat

Page 76: Practical Celery

Make tasks faster.Reduce volume of tasks.

Page 77: Practical Celery

NEWRELIC

Page 78: Practical Celery
Page 79: Practical Celery

MONITORING IS VITAL.

Page 80: Practical Celery

RABBITMQ MANGEMENTPLUGIN

Page 81: Practical Celery

RABBITMQ MANGEMENT PLUGINHAS A GREAT HTTP API!

Page 83: Practical Celery

CELERY FLOWER

Page 84: Practical Celery

QUESTIONS?