chipy dan griffin

22
OpDemand.com Concurrency in Python (and other languages) Chipy Dan Griffin

Upload: azizi

Post on 22-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Chipy Dan Griffin. Why Am I Here?. OpDemand. 1-click cloud deploys Dynamic configuration Automatic and customizable app monitoring Real time log feedback Complete audit trail Easy collaboration with other users EC2, Heroku and soon OpenStack !. Simple Cloud Management. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chipy Dan Griffin

OpDemand.com

Concurrency in Python(and other languages)

ChipyDan Griffin

Page 2: Chipy Dan Griffin

2

Why Am I Here?

1. Concurrency is actually really simple

2. Python has support for just about everything

3. See why other languages specialized

4. Realize they are mostly the same as Python

5. Share how OpDemand has solved problems

6. General tips on writing code that "scales"

Page 3: Chipy Dan Griffin

3

OpDemand

1-click cloud deploys

Dynamic configuration

Automatic and customizable app monitoring

Real time log feedback

Complete audit trail

Easy collaboration with other users

EC2, Heroku and soon OpenStack!

Simple Cloud Management

Page 4: Chipy Dan Griffin

4

Two Reasons for Concurrency

I want to use the time that I am spending waiting for IO or other events. Problems that are IO bound.

1

2 I want to do a lot of work as fast as I can. Problems that are CPU bound and can be parallelized.

Page 5: Chipy Dan Griffin

5

Event loops (Twisted, Asyncore, etc.)

Every significant bit of work slows everything down

Still constrained to 1 process

Library compatibility is terrible

Callback hell. d.addCallback(lambda _: self) Inline deferreds are better a = yield db.find(id)

Let the Operating System tell you when you have work to do. Usually based on select, poll, kqueue.

Page 6: Chipy Dan Griffin

6

deferToThread

CouchDB-Python

d= threads.deferToThread( template_model.assemble, serv )

Use blocking libraries in twisted by deferring them to threads

Page 7: Chipy Dan Griffin

7

Processes

The root of “real” concurrency for Python systems

Process per core + 1 to distribute work and collect results

Fork - create a copy of current process and continue execution

Page 8: Chipy Dan Griffin

8

The Celery Project

Parent process forks n workers

Relies on RabbitMQ and multiprocessing to handle concurrency

Celery is a perfect example

Page 9: Chipy Dan Griffin

9

Threads

Shared memory

Mutation with locks (hopefully)

Everyone knows about the GIL

Still useful in Python

Page 10: Chipy Dan Griffin

10

A Quick Clojure Detour

Software Transactional Memory - SQL like transactions for modifying data from different threads

Embracing mutation of shared data

Everything is based on Threads, you can dosync, send, promise and deliver

Mostly immutable BUT you can change refs with ref-set inside transactions

Page 11: Chipy Dan Griffin

11

Why Does Erlang Exist?

Wraps all the concepts into 1 heavy duty package

Pins schedulers to different cores

Uses thread pools

Has transparent inter-process/server communication

Makes use of OS event loops

You would never want to write many common tasks in it

Page 12: Chipy Dan Griffin

12

How OpDemand Works

Twisted Twisted

Node Proxy

RabbitMQ

Celery Monitor

Client

Twisted

Page 13: Chipy Dan Griffin

13

What does Node do?

Reference SocketIO Implementation

Take service updates and log output from ZMQ and re-publish over SocketIO

Serve static content

Round robin HTTP requests between reactors

Replace with Python or Nginx soon hopefully

Page 14: Chipy Dan Griffin

14

Explicitly Saving, Implicitly Publishing

d = defer.Deferred() d.addCallback(self.transition_state, core_fsm.DEPLOYING) d.addCallback(self._set_status_detail, 'deploy in progress') d.addCallback(self._save_obj, **kwargs) d.addCallback(self._start_interval, context, 'deploy') d.addCallback(self._deploy, context, **kwargs) d.addCallback(self._set_time, 'deploy') d.addCallback(self._set_interval, context, 'deploy') d.addCallback(self.transition_state, core_fsm.ACTIVE) d.addCallback(self._set_status_detail, 'deploy operation successful') d.addCallback(self._save_obj, **kwargs)

Page 15: Chipy Dan Griffin

15

Real-time Publishing

def save_obj(self, this, ctx, **kwargs):# here is where we save to couchsaved_obj = self.db.save(this)

if ctx and "service" in ctx:

if settings.ZMQ_PUBLISHER: tag = 'service-%s' % ctx["service"]["_id"] settings.ZMQ_PUBLISHER.publish( view.to_json(saved_obj), tag=str(tag))

# Publish documents over ZMQ when they are saved

Page 16: Chipy Dan Griffin

16

Wrapping Celery in Twisted

A "polling" deferred using twisted.internet.task

def _do_poll():if celery_task.ready():

raise StopIterationtask = cooperate(_do_poll()) return task.whenDone()

Essentially launch Celery tasks and poll for completion

Page 17: Chipy Dan Griffin

17

A Common Interface for Celery Tasks

# Celery Task Definition@aws_celery.taskdef refresh(comp, config, creds):

doctype = comp.get("doctype")if doctype == "server":

i = Instance() return i.refresh(comp, config)

Celery transforms a component and it’s configuration

Page 18: Chipy Dan Griffin

18

Returning the Finished Product

# AWS Instance Codedef refresh(self, comp, config, **kwargs): boto = self.get_boto(comp, config) comp, config = self.sync(comp, config, boto) return comp, config

The Provider code returns the new Comp and Config

Page 19: Chipy Dan Griffin

19

Why bother with Celery

Code from the first AWS provider using Twisted

# this is one path through this d = threads.deferToThread(self.conn.get_all_images, [dc['image_id']]) d.addErrback(self._handle_error) d.addCallback(self.__get_image) d.addCallback(self.__create_reservation, self.__prepare_kwargs(context, kwargs, resolved)) d.addCallback(self.__construct_instances, context, resolved) d.addCallback(self.__sync_instances, context) d.addCallback(self._save_obj, **kwargs) d.addCallback(self._poll_state, context, 'running', **kwargs) if 'elastic_ip' in dc and dc['elastic_ip'] is not None: d.addCallback(self.__associate_address, context) d.addCallback(self._save_obj, **kwargs) d.addCallback(self.__poll_address, context, **kwargs) d.addCallback(self._save_obj, **kwargs) if not context.config.get("server/instance_id"): d.addCallback(self._poll_signal, context, 22, **kwargs) # transition the server to built state so it gets destroyed # I cut like 20 more lines of code

Page 20: Chipy Dan Griffin

20

Using Celery

Much better

Image_id = self._get_image_id(config) images = conn.get_all_images([image_id]) if len(images) != 1: raise LookupError('Could not find AMI: %s' % image_id) image = images[0] kwargs = self._prepare_run_kwargs(config) reservation = image.run(**kwargs) instances = reservation.instances boto = instances[0] config['ec2-instance/id'] = boto.id config['ec2-instance/region_name'] = boto.region.name config['ec2-instance/zone_name'] = boto._placement.zone return comp, config

Page 21: Chipy Dan Griffin

21

Using Pika

mq.create_async_subscriber("c2-service", "service", handle_service_updates)

def create_async_subscriber(exchange, queue, callback, amqtype="topic"): tw = TwistedHandler(exchange, queue, callback, amqtype=amqtype) connection = TwistedConnection(pika.ConnectionParameters( host=settings.RABBITMQ_HOST, port=settings.RABBITMQ_PORT, virtual_host=settings.RABBITMQ_VHOST), tw.on_connected) return tw

Modified from Pika repository (maybe HEAD works now?)

Subscribe with a Twisted handler

Page 22: Chipy Dan Griffin

OpDemand.com

22