crushing the head of the snake by robert brewer pydata sv 2014

68
Crushing the Head of the Snake Robert Brewer Chief Architect Crunch.io

Upload: pydata

Post on 27-Jan-2015

117 views

Category:

Technology


1 download

DESCRIPTION

Big Data brings with it particular challenges in any language, mostly in performance. This talk will explain how to get immediate speedups in your Python code by exploiting both timeless programming techniques and fixes specific to Python. We will cover: I. Amongst Our Weaponry 1. How to Time and Profile Python 2. Extracting Loop invariants: constants, lookup tables, even methods! 3. Caching: memoization and heavier things II Gunfight at the O.K. Corral in Morse Code 1. Python functions vs C functions 2. Vector operations: NumPy 3. Reducing calls: loops, generators, recursion III. The Semaphore Version of Wuthering Heights 1. Using select instead of Queue 2. Serialization overhead 3. Parallelizing work

TRANSCRIPT

Page 1: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Crushing the Head of the Snake

Robert BrewerChief Architect

Crunch.io

Page 2: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

How to Time

from timeit import Timer

>>> range(5)[0, 1, 2, 3, 4]>>> t = Timer("range(a)", "a = 1000000")>>> t.timeit(1)0.028472900390625>>> t.timeit(100)1.8600409030914307>>> t.timeit(1000)18.056041955947876

Page 3: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Comparing algorithms

>>> Timer("range(1000)").timeit(1 000 000)>>> Timer("range(1000)").timeit()11.392634868621826

>>> Timer("xrange(1000)").timeit()0.20040297508239746

>>> Timer("list(xrange(1000))").timeit()12.207480907440186

Page 4: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Caveat: Overhead

>>> Timer().timeit(1000000)0.029289960861206055

Page 5: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Caveat: Wall time not CPU time

>>> Timer("xrange(1000)").timeit()0.20040297508239746>>> Timer("xrange(1000)").repeat(3)[0.20735883712768555, 0.1968221664428711, 0.18882489204406738] take the minimum

Page 6: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

How to Profile

>>> import mod>>> import cProfile>>> cProfile.run("mod.b()", sort="cumulative")

Page 7: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

How to Profile

>>> import mod>>> import cProfile>>> cProfile.run("mod.b()", sort="cumulative")

(make changes to module)

>>> reload(mod)>>> cProfile.run("mod.b()", sort="cumulative")

Page 8: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

How to Profile

>>> cProfile.run("for i in xrange(3000): range(i).sort()", sort="cumulative") 6002 function calls in 0.093 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(func) 1 0.019 0.019 0.093 0.093 <string>:1(<module>) 3000 0.052 0.000 0.052 0.000 {list.sort} 3000 0.022 0.000 0.022 0.000 {range} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

Page 9: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

How to Profile

6002 function calls in 0.093 seconds

ncalls tottime percall cumtime percall filename:lineno(func)

3000 0.052 0.000 0.052 0.000 {list.sort} 3000 0.022 0.000 0.022 0.000 {range}

Page 10: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Example: Standard Deviation

>>> import numpy>>> n = 100>>> a = numpy.array(xrange(n), dtype=float)>>> a.std(ddof=1)29.011491975882016

Page 11: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Example: Standard Deviation

>>> n = 4 000 000 000>>> a = numpy.array(xrange(n), dtype=float)Traceback (most recent call last): File "<stdin>", line 1, in <module>ValueError: setting an array element with a sequence.

Page 12: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Example: Standard Deviation

>>> n = 4 000 000 000>>> arr = numpy.zeros(n, dtype=float)Traceback (most recent call last): File "<stdin>", line 1, in <module>MemoryError

Page 13: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Example: Standard Deviation

Page 14: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Example: Standard Deviation

Given array A broken in n parts a1...an

and local variance V(ai) = Σj(aij - ai)2

V(a) + 2(Σaij)(ai - A) + |ai|(A2 - ai2)

|A| - ddof

n

Σi = 1√

Page 15: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Example: Standard Deviation

def run(): points = 400 000 (0000) segments = 100 part_len = points / segments

partitions = [] for p in range(segments): part = range(part_len * p, part_len * (p + 1)) partitions.append(part)

return stddev(partitions, ddof=1)

Page 16: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Example: Standard Deviation

def stddev(partitions, ddof=0): final = 0.0 for part in partitions: m = total(part) / length(part)

# Find the mean of the entire group. gtotal = total([total(p) for p in partitions]) glength = total([length(p) for p in partitions]) g = gtotal / glength

adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** 2) * length(part))) final += varsum(part) + adj

return math.sqrt(final / (glength - ddof))

Page 17: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Example: Standard Deviation2052106 function calls in 71.025 seconds

ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 71.023 71.023 stddev.py:39(run) 1 0.006 0.006 71.013 71.013 stddev.py:22(stddev)410400 63.406 0.000 70.490 0.000 stddev.py:4(total) 100 0.341 0.003 69.178 0.692 stddev.py:15(varsum)410601 7.076 0.000 7.076 0.000 {range}410200 0.151 0.000 0.174 0.000 stddev.py:11(length)820700 0.042 0.000 0.042 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt}

Page 18: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Example: Standard Deviation

400 000 in 71.025 seconds

Assuming no other effects of scale,it will take 197.3 hours (over 8 days)to calculate our 4 billion-row array.

Page 19: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Example: Standard Deviation

Can we calculateour 4 billion-row array in

1 minute?

That’s 400,000 in 6 ms.

All we need is an 11,837.5x speedup.

Page 20: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Optimization

Page 21: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Example: Standard Deviation2052106 function calls in 71.025 seconds

ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 71.023 71.023 stddev.py:39(run) 1 0.006 0.006 71.013 71.013 stddev.py:22(stddev)410400 63.406 0.000 70.490 0.000 stddev.py:4(total) 100 0.341 0.003 69.178 0.692 stddev.py:15(varsum)410601 7.076 0.000 7.076 0.000 {range}410200 0.151 0.000 0.174 0.000 stddev.py:11(length)820700 0.042 0.000 0.042 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt}

Page 22: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Amongst Our Weaponry

Extracting loop invariants

Page 23: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Extracting Loop Invariants

def varsum(arr): vs = 0 for j in range(len(arr)): mean = (total(arr) / length(arr)) vs += (arr[j] - mean) ** 2 return vs

Page 24: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Extracting Loop Invariants

def varsum(arr): vs = 0 mean = (total(arr) / length(arr)) for j in range(len(arr)): vs += (arr[j] - mean) ** 2 return vs

Page 25: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Extracting Loop Invariants52606 calls in 1.944 seconds (36x)

ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 1.942 1.942 stddev1.py:41(run) 1 0.006 0.006 1.932 1.932 stddev1.py:23(stddev) 10500 1.673 0.000 1.859 0.000 stddev1.py:4(total) 10701 0.196 0.000 0.196 0.000 {range} 100 0.062 0.001 0.081 0.001 stddev1.py:15(varsum) 10300 0.003 0.000 0.003 0.000 stddev1.py:11(length) 20900 0.001 0.000 0.001 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt}

still 5.4 hrs

Page 26: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Extracting Loop Invariants

def stddev(partitions, ddof=0): final = 0.0

for part in partitions: m = total(part) / length(part)

# Find the mean of the entire group. gtotal = total([total(p) for p in partitions]) glength = total([length(p) for p in partitions]) g = gtotal / glength

adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** 2) * length(part))) final += varsum(part) + adj

return math.sqrt(final / (glength - ddof))

Page 27: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Extracting Loop Invariants

def stddev(partitions, ddof=0): final = 0.0

# Find the mean of the entire group. gtotal = total([total(p) for p in partitions]) glength = total([length(p) for p in partitions]) g = gtotal / glength

for part in partitions: m = total(part) / length(part)

adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** 2) * length(part))) final += varsum(part) + adj

return math.sqrt(final / (glength - ddof))

Page 28: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Extracting Loop Invariants2512 function calls in 0.142 seconds (13x)

ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.140 0.140 stddev1.py:42(run) 1 0.000 0.000 0.136 0.136 stddev1.py:23(stddev) 100 0.063 0.001 0.082 0.001 stddev1.py:15(varsum) 402 0.064 0.000 0.071 0.000 stddev1.py:4(total) 603 0.013 0.000 0.013 0.000 {range} 400 0.000 0.000 0.000 0.000 stddev1.py:11(length) 902 0.000 0.000 0.000 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt}

still 23 minutes

Page 29: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Amongst Our Weaponry

Use builtin Python functionswhenever possible

Page 30: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Use Python Builtins

def total(arr): s = 0 for j in range(len(arr)): s += arr[j] return s

Page 31: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Use Python Builtins

def total(arr): s = 0 for j in range(len(arr)): s += arr[j] return s

def total(arr): return sum(arr)

Page 32: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Use Python Builtins2110 function calls in 0.096 seconds (1.47x)ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.093 0.093 stddev1.py:39(run) 1 0.000 0.000 0.083 0.083 stddev1.py:20(stddev) 100 0.065 0.001 0.070 0.001 stddev1.py:12(varsum) 402 0.000 0.000 0.015 0.000 stddev1.py:4(total) 402 0.015 0.000 0.015 0.000 {sum} 201 0.012 0.000 0.012 0.000 {range} 400 0.000 0.000 0.000 0.000 stddev1.py:8(length) 500 0.000 0.000 0.000 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt}

still 16 minutes

Page 33: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Use Python Builtins2110 function calls in 0.096 seconds (1.47x)ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.093 0.093 stddev1.py:39(run) 1 0.000 0.000 0.083 0.083 stddev1.py:20(stddev) 100 0.065 0.001 0.070 0.001 stddev1.py:12(varsum) 402 0.000 0.000 0.015 0.000 stddev1.py:4(total) 402 0.015 0.000 0.015 0.000 {sum} 201 0.012 0.000 0.012 0.000 {range} 400 0.000 0.000 0.000 0.000 stddev1.py:8(length) 500 0.000 0.000 0.000 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt}

Page 34: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Use Python Builtins

def varsum(arr): vs = 0 mean = (total(arr) / length(arr)) for j in range(len(arr)): vs += (arr[j] - mean) ** 2 return vs

Page 35: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Use Python Builtins

def varsum(arr):

mean = (total(arr) / length(arr)) return sum((v - mean) ** 2 for v in arr)

Page 36: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Use Python Builtins

402110 function calls in 0.122 seconds1.27x slower

ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.120 0.120 stddev.py:36(run) 1 0.000 0.000 0.115 0.115 stddev.py:17(stddev) 502 0.044 0.000 0.114 0.000 {sum} 100 0.000 0.000 0.106 0.001 stddev.py:12(varsum)400100 0.070 0.000 0.070 0.000 stddev.py:14(genexpr) 402 0.000 0.000 0.011 0.000 stddev.py:4(total)

Page 37: Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Page 38: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Amongst Our Weaponry

Reduce function calls

Page 39: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Reduce Function Calls>>> Timer("sum(a)", "a = range(10)").repeat(3)[0.15801000595092773, 0.1406857967376709, 0.14577603340148926]

>>> Timer("total(a)", "a = range(10); total = lambda x: sum(x)" ).repeat(3)[0.2066800594329834, 0.1998300552368164, 0.21536493301391602]

0.000 000 059 seconds per call

Page 40: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Reduce Function Calls

def variances_squared(arr): mean = (total(arr) / length(arr)) for v in arr: yield (v - mean) ** 2

Page 41: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Reduce Function Calls

def varsum(arr): mean = (total(arr) / length(arr)) return sum( (v - mean) ** 2 for v in arr )

def varsum(arr): mean = (total(arr) / length(arr)) return sum([(v - mean) ** 2 for v in arr])

Page 42: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Reduce Function Calls2010 function calls in 0.082 seconds (1.17x)ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.080 0.080 stddev.py:36(run) 1 0.000 0.000 0.071 0.071 stddev.py:17(stddev) 100 0.050 0.001 0.056 0.001 stddev.py:12(varsum) 502 0.020 0.000 0.020 0.000 {sum} 402 0.000 0.000 0.016 0.000 stddev.py:4(total) 101 0.009 0.000 0.009 0.000 {range} 400 0.000 0.000 0.000 0.000 stddev.py:8(length) 400 0.000 0.000 0.000 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt}

still 13+ minutes

Page 43: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Amongst Our Weaponry

Vector operationswith NumPy

Page 44: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Vector Operations

part = numpy.array( xrange(...), dtype=float)

def total(arr): return arr.sum()

def varsum(arr): return ( (arr - arr.mean()) ** 2).sum()

Page 45: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Vector Operations3408 function calls in 0.057 seconds (1.43x)

ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.057 0.057 stddev1.py:37(run) 200 0.051 0.000 0.051 0.000 {numpy...array} 1 0.001 0.001 0.006 0.006 stddev1.py:18(stddev) 500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce} 100 0.001 0.000 0.003 0.000 stddev1.py:14(varsum) 400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum} 300 0.000 0.000 0.002 0.000 stddev1.py:6(total) 100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean}

still 9.5 minutes

Page 46: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Vector Operations3408 function calls in 0.057 seconds (1.43x)

ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.057 0.057 stddev1.py:37(run) 200 0.051 0.000 0.051 0.000 {numpy...array} 1 0.001 0.001 0.006 0.006 stddev1.py:18(stddev) 500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce} 100 0.001 0.000 0.003 0.000 stddev1.py:14(varsum) 400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum} 300 0.000 0.000 0.002 0.000 stddev1.py:6(total) 100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean}

still 9.5 minutes

Page 47: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Vector Operations3408 function calls in 0.006 seconds (13.6x)

ncalls tottime percall cumtime percall filename:lineno(func)

1 0.001 0.001 0.006 0.006 stddev1.py:18(stddev) 500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce} 100 0.001 0.000 0.003 0.000 stddev1.py:14(varsum) 400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum} 300 0.000 0.000 0.002 0.000 stddev1.py:6(total) 100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean}

should be exactly 1 minute

Page 48: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Vector Operations

Let’s try 4 billion!

Bump up that N...

Page 49: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Vector Operations

MemoryError

Oh, yeah...

Page 50: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Amongst Our Weaponry

Parallelizationwith

multiprocessing

Page 51: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Parallelization

from multiprocessing import Pool

def run(): results = Pool().map( run_one, range(segments)) result = stddev(results) return result

Page 52: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Parallelization

def run_one(i): p = numpy.memmap( 'stddev.%d' % i, dtype=float, mode='r', shape=(part_len,))

T, L = p.sum(), float(len(p)) m = T / L V = ((p - m) ** 2).sum() return T, L, V

Page 53: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Parallelization

def stddev(TLVs, ddof=0): final = 0.0

totals = [T for T, L, V in TLVs] lengths = [L for T, L, V in TLVs] glength = sum(lengths) g = sum(totals) / glength

for T, L, V in TLVs: m = T / L adj = ((2 * T * (m - g)) + ((g ** 2 - m ** 2) * L)) final += V + adj

return math.sqrt(final / (glength - ddof))

Page 54: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Parallelization3734 function calls in 0.024 seconds

6x slower

ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.024 0.024 stddev.py:47(run) 4 0.000 0.000 0.011 0.003 threading.py:234(wait) 22 0.011 0.000 0.011 0.000 {thread.lock.acquire} 1 0.000 0.000 0.011 0.011 pool.py:222(map) 1 0.000 0.000 0.008 0.008 pool.py:113(__init__) 4 0.001 0.000 0.005 0.001 process.py:116(start) 1 0.003 0.003 0.005 0.005 stddev.py:11(stddev) 4 0.000 0.000 0.004 0.001 forking.py:115(init) 4 0.003 0.001 0.003 0.001 {posix.fork}

...

Page 55: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Parallelization

Could that waiting be insignificantwhen we scale up to 4 billion?

Let’s try it!

Page 56: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Parallelization3766 function calls in 67.811 seconds

ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 67.811 67.811 stddev.py:47(run) 4 0.000 0.000 67.747 16.930 threading.py:234(wait) 22 67.747 3.079 67.747 3.079 {thread.lock.acquire} 1 0.000 0.000 67.747 67.747 pool.py:222(map) 1 0.000 0.000 0.062 0.060 pool.py:113(__init__) 4 0.000 0.000 0.058 0.014 process.py:116(start) 4 0.057 0.014 0.057 0.014 {posix.fork} 1 0.003 0.003 0.005 0.005 stddev.py:11(stddev) 2 0.002 0.001 0.002 0.001 {sum}

SO CLOSE! 1.13 minutes

Page 57: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Parallelization

def run_one(i): if i == 50: cProfile.runctx(..., "prf.50")

>>> import pstats>>> s = pstats.Stats("prf.50")>>> s.sort_stats("cumulative")<pstats.Stats instance at 0x2bddcb0>>>> _.print_stats()

Page 58: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Parallelization

57 function calls in 2.804 seconds

ncalls tottime percall cumtime percall filename:lineno(func) 1 0.431 0.431 2.791 2.791 stddev.py:43(run_one) 2 0.000 0.000 2.360 1.180 numpy.ndarray.sum 2 2.360 1.180 2.360 1.180 numpy.ufunc.reduce 1 0.000 0.000 0.000 0.000 memmap.py:195(__new__)

Page 59: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Parallelization

def run_one(i): p = numpy.memmap( 'stddev.%d' % i, dtype=float, mode='r', shape=(part_len,))

T, L = p.sum(), float(len(p)) m = T / L V = ((p - m) ** 2).sum() return T, L, V

200 seconds / 4 cores = 50

Page 60: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Parallelization? Serialization!

67.8 seconds for 4 billion rows, but-50 of those are loading data! 17.8 seconds to do the actual math.

Page 61: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Serialization

import bloscpack as bpbargs = bp.args.DEFAULT_BLOSC_ARGSbargs['clevel'] = 6

bp.pack_ndarray_file( part, fname, blosc_args=bargs)

part = bp.unpack_ndarray_file(fname)

Page 62: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Serialization

Let’s try it!

Page 63: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

I Crush Your

Head!

Page 64: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

I Crush Your Head!1153 function calls in 26.166 seconds

ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 26.166 26.166 stddev_bp.py:56(run) 4 0.000 0.000 26.134 6.53 threading.py:234(wait) 22 26.134 1.188 26.134 1.188 thread.lock.acquire 1 0.000 0.000 26.133 26.133 pool.py:222(map) 1 0.000 0.000 26.133 26.133 pool.py:521(get) 1 0.000 0.000 26.133 26.133 pool.py:513(wait) 1 0.003 0.003 0.030 0.030 __init__.py:227(Pool) 1 0.000 0.000 0.021 0.021 pool.py:113(__init__)

Page 65: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

I Crush Your Head!

With some time-tested generalprogramming techniques:

Extract loop invariants

Use language builtins

Reduce function calls

Page 66: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

I Crush Your Head!

And some Python librariesfor architectural improvements:

Use NumPy for vector ops

Use multiprocessing for parallelization

Use bloscpack for compression

Page 67: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

I Crush Your Head!

We sped up our calculationso that it runs in:

0.003% of the time

or 27317 times faster

4.4 orders of magnitude

Page 68: Crushing the Head of the Snake by Robert Brewer PyData SV 2014

Crushing the Head of the Snake

Any questions?

@[email protected]