python performance profiling: the guts and the glory
DESCRIPTION
Your Python program is too slow, and you need to optimize it. Where do you start? With the right tools, you can optimize your code where it counts. We’ll explore the guts of the Python profiler “Yappi” to understand its features and limitations. We’ll learn how to find the maximum performance wins with minimum effort.TRANSCRIPT
Python Profiling:
A. Jesse Jiryu Davis
@jessejiryudavis
MongoDB
The Glory&
The Guts
“PyMongo is slower!compared to the JavaScript version”
MongoDB Node.js driver:!88,000 per secondPyMongo: ! ! ! ! ! ! ! ! ! 29,000 per second
“Why Is PyMongo Slower?”
From:[email protected]!To:!! [email protected]!CC:!! [email protected]
Hi Jesse,!!Why is the Node MongoDB driver 3 times!faster than PyMongo?!http://dzone.com/articles/mongodb-facts-over-80000
The Python Code
# Obtain a MongoDB collection.!import pymongo!!client = pymongo.MongoClient('localhost')!db = client.random!collection = db.randomData!collection.remove()!
n_documents = 80000!batch_size = 5000!batch = []!!import time!start = time.time()
The Python Code
import random!from datetime import datetime!!min_date = datetime(2012, 1, 1)!max_date = datetime(2013, 1, 1)!delta = (max_date - min_date).total_seconds()!
The Python Code
What?!
The Python Codefor i in range(n_documents):! date = datetime.fromtimestamp(! time.mktime(min_date.timetuple())! + int(round(random.random() * delta)))!! value = random.random()! document = {! 'created_on': date,! 'value': value}!! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []!
duration = time.time() - start!!print 'inserted %d documents per second' % (! n_documents / duration)!
The Python Code
inserted 30,000 documents per second
The Node.js Code
(not shown)
The Question
Why is the Python script 3 times slower than the equivalent Node script?
Why Profile?
• Optimization is like debugging• Hypothesis:
“The following change will yield a worthwhile improvement.”
• Experiment
• Repeat until fast enough
Why Profile?
Profiling is a way togenerate hypotheses.
Which Profiler?
• cProfile • GreenletProfiler • Yappi
Yappi
By Sümer Cip
Yappi
Compared to cProfile, it is: !
• As fast • Also measures functions • Can measure CPU time, not just wall• Can measure all threads • Can export to callgrind
Yappiimport yappi!!yappi.set_clock_type('cpu')!yappi.start(builtins=True)!!start = time.time()!!for i in range(n_documents):! # ... same code ... !!duration = time.time() - start!stats = yappi.get_func_stats()!stats.save('callgrind.out', type='callgrind')!
Same code as before
KCacheGrind
for index in range(n_documents):! date = datetime.fromtimestamp(! time.mktime(min_date.timetuple())! + int(round(random.random() * delta)))!! value = random.random()! document = {! 'created_on': date,! 'value': value}!! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []!
The Python Code
one third
of the tim
e
for index in range(n_documents):! date = datetime.now()!!!! value = random.random()! document = {! 'created_on': date,! 'value': value}!! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []!
The Python Code
The Python Code
• Before: 30,000 inserts per second • After: 50,000 inserts per second
Why Profile?
• Generate hypotheses• Estimate possible improvement
How DoesProfiling Work?
int callback(PyFrameObject *frame,! int what,! PyObject *arg);!
int start(void)!{! PyEval_SetProfile(callback);!}!
PyObject *!PyEval_EvalFrameEx(PyFrameObject *frame)!{! if (tstate->c_profilefunc != NULL) {! tstate->c_profilefunc(frame,! PyTrace_CALL,! Py_None);! }!! /* ... execute bytecode in the frame! * until return or exception... */!! if (tstate->c_profilefunc != NULL) {! tstate->c_profilefunc(frame,! PyTrace_RETURN,! retval);! }!}!
int callback(PyFrameObject *frame,! int what,! PyObject *arg)!{! switch (what) {! case PyTrace_CALL:! {! PyCodeObject *cobj = frame->f_code;! PyObject *filename = cobj->co_filename;! PyObject *funcname = cobj->co_name;!! /* ... record the function call ... */! }! break;!! /* ... other cases ... */!! }!}!
A. Jesse Jiryu Davis
@jessejiryudavis
MongoDB