chipy dan griffin
DESCRIPTION
Chipy Dan Griffin. Why Am I Here?. OpDemand. 1-click cloud deploys Dynamic configuration Automatic and customizable app monitoring Real time log feedback Complete audit trail Easy collaboration with other users EC2, Heroku and soon OpenStack !. Simple Cloud Management. - PowerPoint PPT PresentationTRANSCRIPT
OpDemand.com
Concurrency in Python(and other languages)
ChipyDan Griffin
2
Why Am I Here?
1. Concurrency is actually really simple
2. Python has support for just about everything
3. See why other languages specialized
4. Realize they are mostly the same as Python
5. Share how OpDemand has solved problems
6. General tips on writing code that "scales"
3
OpDemand
1-click cloud deploys
Dynamic configuration
Automatic and customizable app monitoring
Real time log feedback
Complete audit trail
Easy collaboration with other users
EC2, Heroku and soon OpenStack!
Simple Cloud Management
4
Two Reasons for Concurrency
I want to use the time that I am spending waiting for IO or other events. Problems that are IO bound.
1
2 I want to do a lot of work as fast as I can. Problems that are CPU bound and can be parallelized.
5
Event loops (Twisted, Asyncore, etc.)
Every significant bit of work slows everything down
Still constrained to 1 process
Library compatibility is terrible
Callback hell. d.addCallback(lambda _: self) Inline deferreds are better a = yield db.find(id)
Let the Operating System tell you when you have work to do. Usually based on select, poll, kqueue.
6
deferToThread
CouchDB-Python
d= threads.deferToThread( template_model.assemble, serv )
Use blocking libraries in twisted by deferring them to threads
7
Processes
The root of “real” concurrency for Python systems
Process per core + 1 to distribute work and collect results
Fork - create a copy of current process and continue execution
8
The Celery Project
Parent process forks n workers
Relies on RabbitMQ and multiprocessing to handle concurrency
Celery is a perfect example
9
Threads
Shared memory
Mutation with locks (hopefully)
Everyone knows about the GIL
Still useful in Python
10
A Quick Clojure Detour
Software Transactional Memory - SQL like transactions for modifying data from different threads
Embracing mutation of shared data
Everything is based on Threads, you can dosync, send, promise and deliver
Mostly immutable BUT you can change refs with ref-set inside transactions
11
Why Does Erlang Exist?
Wraps all the concepts into 1 heavy duty package
Pins schedulers to different cores
Uses thread pools
Has transparent inter-process/server communication
Makes use of OS event loops
You would never want to write many common tasks in it
12
How OpDemand Works
Twisted Twisted
Node Proxy
RabbitMQ
Celery Monitor
Client
Twisted
13
What does Node do?
Reference SocketIO Implementation
Take service updates and log output from ZMQ and re-publish over SocketIO
Serve static content
Round robin HTTP requests between reactors
Replace with Python or Nginx soon hopefully
14
Explicitly Saving, Implicitly Publishing
d = defer.Deferred() d.addCallback(self.transition_state, core_fsm.DEPLOYING) d.addCallback(self._set_status_detail, 'deploy in progress') d.addCallback(self._save_obj, **kwargs) d.addCallback(self._start_interval, context, 'deploy') d.addCallback(self._deploy, context, **kwargs) d.addCallback(self._set_time, 'deploy') d.addCallback(self._set_interval, context, 'deploy') d.addCallback(self.transition_state, core_fsm.ACTIVE) d.addCallback(self._set_status_detail, 'deploy operation successful') d.addCallback(self._save_obj, **kwargs)
15
Real-time Publishing
def save_obj(self, this, ctx, **kwargs):# here is where we save to couchsaved_obj = self.db.save(this)
if ctx and "service" in ctx:
if settings.ZMQ_PUBLISHER: tag = 'service-%s' % ctx["service"]["_id"] settings.ZMQ_PUBLISHER.publish( view.to_json(saved_obj), tag=str(tag))
# Publish documents over ZMQ when they are saved
16
Wrapping Celery in Twisted
A "polling" deferred using twisted.internet.task
def _do_poll():if celery_task.ready():
raise StopIterationtask = cooperate(_do_poll()) return task.whenDone()
Essentially launch Celery tasks and poll for completion
17
A Common Interface for Celery Tasks
# Celery Task Definition@aws_celery.taskdef refresh(comp, config, creds):
doctype = comp.get("doctype")if doctype == "server":
i = Instance() return i.refresh(comp, config)
Celery transforms a component and it’s configuration
18
Returning the Finished Product
# AWS Instance Codedef refresh(self, comp, config, **kwargs): boto = self.get_boto(comp, config) comp, config = self.sync(comp, config, boto) return comp, config
The Provider code returns the new Comp and Config
19
Why bother with Celery
Code from the first AWS provider using Twisted
# this is one path through this d = threads.deferToThread(self.conn.get_all_images, [dc['image_id']]) d.addErrback(self._handle_error) d.addCallback(self.__get_image) d.addCallback(self.__create_reservation, self.__prepare_kwargs(context, kwargs, resolved)) d.addCallback(self.__construct_instances, context, resolved) d.addCallback(self.__sync_instances, context) d.addCallback(self._save_obj, **kwargs) d.addCallback(self._poll_state, context, 'running', **kwargs) if 'elastic_ip' in dc and dc['elastic_ip'] is not None: d.addCallback(self.__associate_address, context) d.addCallback(self._save_obj, **kwargs) d.addCallback(self.__poll_address, context, **kwargs) d.addCallback(self._save_obj, **kwargs) if not context.config.get("server/instance_id"): d.addCallback(self._poll_signal, context, 22, **kwargs) # transition the server to built state so it gets destroyed # I cut like 20 more lines of code
20
Using Celery
Much better
Image_id = self._get_image_id(config) images = conn.get_all_images([image_id]) if len(images) != 1: raise LookupError('Could not find AMI: %s' % image_id) image = images[0] kwargs = self._prepare_run_kwargs(config) reservation = image.run(**kwargs) instances = reservation.instances boto = instances[0] config['ec2-instance/id'] = boto.id config['ec2-instance/region_name'] = boto.region.name config['ec2-instance/zone_name'] = boto._placement.zone return comp, config
21
Using Pika
mq.create_async_subscriber("c2-service", "service", handle_service_updates)
def create_async_subscriber(exchange, queue, callback, amqtype="topic"): tw = TwistedHandler(exchange, queue, callback, amqtype=amqtype) connection = TwistedConnection(pika.ConnectionParameters( host=settings.RABBITMQ_HOST, port=settings.RABBITMQ_PORT, virtual_host=settings.RABBITMQ_VHOST), tw.on_connected) return tw
Modified from Pika repository (maybe HEAD works now?)
Subscribe with a Twisted handler
OpDemand.com
22