python, go and the cost of concurrency in the cloud
TRANSCRIPT
Goal of this talk
• Introduce Go for Python (& Ruby) programmers
• Explain key differences between Go and the other “no semicolons” languages
• Show an example application illustrating why those key differences matter for your app’s bottom line
About me• Daily Python hacker
• came from C/C++, UNIX systems hacking background
• PhD on P2P/crypto research, more Python & C++
• Two months experience in Go
• co-founder, Tracelytics (now AppNeta)
Things I like about Python
• Fun to program — “Zen of Python”
• Builtin maps, sets, arrays, tuples
• Good library support
• Simple duck typing (as opposed to strict OO)
• A little code goes a long way
Things I don’t like about Python
• Performance: not too slow, but not too fast either
• Dependencies can be a pain (virtualenv, pip, etc)
• The dreaded Global Interpreter Lock (GIL)
• Lack of typed function signatures can make reading code difficult
Go• Announced 2009
• Creators: Ken Thompson (B, Plan 9 from Bell Labs), Rob Pike (Plan 9), Robert Griesemer (V8 engine) • “all three of us had to be talked into every feature in the
language, so there was no extraneous garbage put into the language for any reason”
• Statically typed, garbage-collected
• Fast compilation, static linking
Go is simple• Basic builtin data types
• boolean, int (int, int32, int64, byte, rune, …), float, complex, string • Complex builtin types
• pointer - a typed reference to a value • array - fixed length sequence of typed elements • slice - window into part of an array • map - typed key/value • channel - typed, optionally directional and/or buffered • struct - sequence of named and typed elements
• All of Go’s keywords: break default func interface select case defer go map struct chan else goto package switch const fallthrough if range type continue for import return var
Similarities betweenGo and Python
• Easy to read, no semicolons
• Built-in maps, arrays, strings
• Both support calling into C code when necessary
• Interfaces based on duck typing
• No virtual inheritance
• Statically typed, but type inference and interfaces give it a “dynamic feel”
Differences betweenGo and Python
• Go is compiled to native machine code • Fast compiler, single static binary
• Go is fast; memory usage depends on size of structs: no per-object dictionaries, as in Python
• Go has concurrency features built into the language: goroutines, channels, runtime scheduler
• Go has curly braces
Go examples• Lots of resources online to help you learn Go!
• Following slides from “Go for Pythonistas” by Francesc Campoy Flores, Google and “Go and the Zen of Python” by Andrew Gerrand, Google
• Also interesting viewing: “Go for Python Programmers” by Brian Dorsey, Google
• See also: The Go Programming Language Blog, blog.golang.org
Go methods & objects"Simple is better than complex."
Methods are just functions (no special location)
There's no this or self - the receiver is like any other function argument
type Vector struct { X, Y float64}
func (v Vector) Abs() float64 { return math.Sqrt(v.X*v.X + v.Y*v.Y)}
Go methods & objects"Simple is better than complex."
Methods are just functions (no special location)
There's no this or self - the receiver is like any other function argument
type Vector struct { X, Y float64}
func (v Vector) Abs() float64 { return math.Sqrt(v.X*v.X + v.Y*v.Y)}
"Simple is better than complex."
Methods can be declared on any named type (no classes)
type Scalar float64
func (s Scalar) Abs() float64 { if s < 0 { return float64(-s) } return float64(s)}
Go methods & objects"Simple is better than complex."
Methods are just functions (no special location)
There's no this or self - the receiver is like any other function argument
type Vector struct { X, Y float64}
func (v Vector) Abs() float64 { return math.Sqrt(v.X*v.X + v.Y*v.Y)}
"Simple is better than complex."
Methods can be declared on any named type (no classes)
type Scalar float64
func (s Scalar) Abs() float64 { if s < 0 { return float64(-s) } return float64(s)}
"Simple is better than complex."
Interfaces are just methods (no data)
Interfaces are implicit (no implements declaration)
type Abser interface { Abs() float64}
(Both Vector and Scalar implement Abser, even though they don't know that Abser exists.)
Go methods & objects"Simple is better than complex."
Methods are just functions (no special location)
There's no this or self - the receiver is like any other function argument
type Vector struct { X, Y float64}
func (v Vector) Abs() float64 { return math.Sqrt(v.X*v.X + v.Y*v.Y)}
"Simple is better than complex."
Methods can be declared on any named type (no classes)
type Scalar float64
func (s Scalar) Abs() float64 { if s < 0 { return float64(-s) } return float64(s)}
"Simple is better than complex."
Interfaces are just methods (no data)
Interfaces are implicit (no implements declaration)
type Abser interface { Abs() float64}
(Both Vector and Scalar implement Abser, even though they don't know that Abser exists.)
"Simple is better than complex."
Identifier case sets visibility.
If a name begins with a capital, it is visible outside its package:
package foo
type Foo struct { // exported type bar int // unexported field}
func (f Foo) Bar() {} // exported method
func (f Foo) quux() {} // unexported method
Only code inside the package can see unexported ("private") names.
Go control flow• Just a few keywords:
• if • for • switch • select (like switch for channels)
• But without: • Ternary operator (Python: X if COND else Y) • List comprehensions, crazy Python-style one-liners
Go syntax and simplicity"Readability counts."
Go was designed for teams of hundreds/thousands of programmers. Readability is of paramount importance.
The gofmt tool enforces "one true style." (No more stupid arguments.)
Type inference saves a lot of typing, but not at the cost of readability. Types are stillrequired where they help readability (function declarations, for example).
Many other language design decisions were made in the name of readability (case-based name visibility, for example).
Example: Go URL-fetcherGo one-pagers (3/4)
package main
import ( "fmt"; "net/http"; "time" )
func main() { urls := []string{"http://google.com/", "http://bing.com/"} start := time.Now() done := make(chan string) for _, u := range urls { go func(u string) { resp, err := http.Get(u) if err != nil { done <- u + " " + err.Error() } else { done <- u + " " + resp.Status } }(u) } for _ = range urls { fmt.Println(<-done, time.Since(start)) }} Run
fib.py
Have you ever heard of Fibonacci?
def fib(n): a, b = 0, 1 for i in range(n): a, b = b, a + b return b
def fib_rec(n): if n <= 1: return 1 else: return fib_rec(n-1) + fib_rec(n-2)
for x in range(10): print fib(x), fib_rec(x)
fib.go
Something familiar?
func fib(n int) int { a, b := 0, 1 for i := 0; i < n; i++ { a, b = b, a+b } return b}
func fibRec(n int) int { if n <= 1 { return 1 } return fibRec(n-1) + fibRec(n-2)}
func main() { for i := 0; i < 10; i++ { fmt.Println(fib(i), fibRec(i)) }} Run
Fibonacci without generators? What?
Python generators are awesome.
def fib(n): a, b = 0, 1 for i in range(n): a, b = b, a + b yield a
Mechanically complex.
f = fib(10)try: while True: print f.next()except StopIteration: print 'done'
But very easy to use.
for x in fib(10): print xprint 'done'
Go concurrency
Based on goroutines and channels.
Goroutines: very light processing actors (the gophers).
Channels: typed, synchronized, thread-safe pipes (the arrows).
"Generator" goroutines
Uses a channel send instead of yield.
func fib(c chan int, n int) { a, b := 0, 1 for i := 0; i < n; i++ { a, b = b, a+b c <- a } close(c)}
func main() { c := make(chan int) go fib(c, 10)
for x := range c { fmt.Println(x) }} Run
"Generator" goroutines
A more generator-like style:
func fib(n int) chan int { c := make(chan int) go func() { a, b := 0, 1 for i := 0; i < n; i++ { a, b = b, a+b c <- a } close(c) }() return c}
func main() { for x := range fib(10) { fmt.Println(x) }} Run
Language comparisonPython Ruby JS/
Node.js C/C++ Java Go
semicolons N N Y Y Y N
curly braces N N* Y Y Y Y
static types N N N Y Y Y
easy-to-use concurrency N N Y N N Y
multi-core concurrency N N N Y Y Y
compiled N N N Y Y Y
OO: classes, inheritance Y Y Y Y Y N*
So who cares?• You do — concurrency is important in the modern
computing environment
• Programming for “the cloud” or for “SOA” or “microservices” is fundamentally different than writing a LAMP/MEAN/Rails app
• Assumptions on latency, throughput, scale all change
• The language you pick can cost you time & money
Cloud vs. self-managedCLOUD-MANAGED
• Managed
• “Infinite” scale
• HTTP-based RPC
• Usage-based pricing
• Hard to overprovision
SELF-MANAGED
• Self-hosted
• As scalable as you can make it (e.g. Redis vs. Cassandra)
• Connection-oriented services
• Instance-based pricing
• Some overprovisioning necessary
Cloud environments demand concurrency
• Self-hosted systems and databases generally use pools of long-living connections
• RabbitMQ vs SQS
• HTTP-based APIs can have high latency
• DynamoDB 5-10ms latency
• Kinesis PutRecords, S3, SQS 10-100ms latency
What about my async code for Python, Ruby, Node?
• Async I/O makes network, disk reads & writes asynchronous
• Used by Python’s gevent, Tornado, Twisted
• Ruby EventMachine, Celluloid
• Node.js, libuv, libev, libevent
• Allows interpreter to switch to another execution context/greenlet/thread while I/O is pending
• Go: blocking I/O is OK when you have multiple goroutines
Cloud APIs require compute-heavy RPCs
• HTTP-based APIs with authenticated JSON/XML
• Encryption: TLS/SSL key exchange, negotiation
• Authentication: AWS, Google request signature schemes
• Serialization: Convert data to JSON, base64, etc
• Not as simple as binary data over raw sockets
• Not pure disk/network I/O — not as easy to use async I/O
Increasing prevalenceof multi-core architectures
• Dual-core, quad-core, 8-core, 16-core, 32-core …
• How will you use all those CPUs?
• Strong opinion: Docker, containerization is a crutch for horizontally scaling single-threaded services
Motivating example
S3 + DynamoDBAnalysisAnalysisAnalysisAnalysisAnalysisAnalysisAnalysisAnalysisAnalysisAnalysis
S3 + DynamoDBS3 + DynamoDBS3 + DynamoDBS3 + DynamoDBS3 + DynamoDBS3 + DynamoDBS3 + DynamoDBS3 + DynamoDBS3 + DynamoDB
~700 items analyzed each second ~700 S3 PUTs/sec, ~70KB each ~700 DynamoDB item writes/sec
Cloud storage, queue, and log costs
Write CostMax
Object Size
Storage $/GB-month Read cost
S3 $5/million PUTs 5GB $0.03 $0.01 Glacier $0.40/million GETs
GCS $10/million PUTs 5TB $0.026$0.01 Nearline $1/million GETs
SQS $0.50/million API requests 192KB $0.50/million API requests
Kinesis $0.028/million PUT records 50KB $11 per(each reads
shard-month 2MB/sec)
DynamoDB
$0.471KB
(half off with
Hz-monthitems reservation)
400KB (all Ks & Vs)
$0.25 (structured,
indexed)
$0.0474KB
(eventually
Hz-month itemsconsistent)
Motivating example~700 items analyzed each second ~700 S3 PUTs/sec, ~70KB each ~700 DynamoDB item writes/sec
SQSSQSSQSSQSSQSAnalysisAnalysisAnalysisAnalysisAnalysis SQSAnalysis SQSAnalysis SQSAnalysis SQSAnalysis SQSAnalysis
Batch S3 Writesfor fewer S3 PUTs
SQSSQSSQSSQSSQSAnalysisAnalysisAnalysisAnalysisAnalysis SQSAnalysis SQSAnalysis SQSAnalysis SQSAnalysis SQSAnalysis
BatchWriter
Read data objects from SQS Batch into larger files and store in S3
Batch S3 Writer
SQSSQSSQSSQSSQSAnalysisAnalysisAnalysisAnalysisAnalysis SQSAnalysis SQSAnalysis SQSAnalysis SQSAnalysis SQSAnalysis
Batch Writer
S3
DynamoDB
Read data objects from SQS Batch into larger files and store in S3
Write S3 URL and offsets to DynamoDB
SQS + S3 + DynamoDBno batching
Monthly Cost
Usage Rate
Batch Size Service
$13,100 1000 Hz 1 S3 PUT
$474 1000 Hz 1 DynamoDB item writes
$13,574 1000 Hz TOTAL
S3 + DynamoDBAnalysisAnalysisAnalysisAnalysisAnalysisAnalysisAnalysisAnalysisAnalysisAnalysis
S3 + DynamoDBS3 + DynamoDBS3 + DynamoDBS3 + DynamoDBS3 + DynamoDBS3 + DynamoDBS3 + DynamoDBS3 + DynamoDBS3, DynamoDB
SQS + S3 + DynamoDBS3_BATCH_SZ=10
Monthly Cost
Usage Rate
Batch Size Service
$1310 1000 Hz 1 SQS SendMessage
$131 100 Hz 10 SQS ReceiveMessage
$131 100 Hz 10 SQS DeleteMessage
$1310 100 Hz 10 S3 PUT
$474 100 Hz 10 DynamoDB item writes
$3356 1000 Hz TOTAL (24.7%)
SQSQSQSQSQAnalysAnalysAnalysAnalysAnalys SQAnalys SQAnalys SQAnalys SQAnalys SQSAnalysis
Batch Writer
S3
DynamoDB
SQS + S3 + DynamoDBS3_BATCH_SZ=10
Monthly Cost
Usage Rate
Batch Size Service
$1310 1000 Hz 1 SQS SendMessage
$131 100 Hz 10 SQS ReceiveMessage
$131 100 Hz 10 SQS DeleteMessage
$1310 100 Hz 10 S3 PUT
$474 100 Hz 10 DynamoDB item writes
$3356 1000 Hz TOTAL (24.7%)
SQSQSQSQSQAnalysAnalysAnalysAnalysAnalys SQAnalys SQAnalys SQAnalys SQAnalys SQSAnalysis
Batch Writer
S3
DynamoDB
SQS + S3 + DynamoDBS3_BATCH_SZ=100
Monthly Cost
Usage Rate
Batch Size Service
$1310 1000 Hz 1 SQS SendMessage
$131 100 Hz 10 SQS ReceiveMessage
$131 100 Hz 10 SQS DeleteMessage
$131 10 Hz 100 S3 PUT
$474 10 Hz 100 DynamoDB item writes
$2177 1000 Hz TOTAL (16%)
SQSQSQSQSQAnalysAnalysAnalysAnalysAnalys SQAnalys SQAnalys SQAnalys SQAnalys SQSAnalysis
Batch Writer
S3
DynamoDB
SQS + GCS + DynamoDBGCS_BATCH_SZ=100
Monthly Cost
Usage Rate
Batch Size Service
$1310 1000 Hz 1 SQS SendMessage
$131 100 Hz 10 SQS ReceiveMessage
$131 100 Hz 10 SQS DeleteMessage
$263 10 Hz 100 GCS PUT
$474 10 Hz 100 DynamoDB item writes
$2300 1000 Hz TOTAL (16%)
SQSQSQSQSQAnalysAnalysAnalysAnalysAnalys SQAnalys SQAnalys SQAnalys SQAnalys SQSAnalysis
Batch Writer
Google Cloud Storage
DynamoDB
SQS + S3 + DynamoDBS3_BATCH_SZ=100
Monthly Cost
Usage Rate
Batch Size Service
$1310 1000 Hz 1 SQS SendMessage
$131 100 Hz 10 SQS ReceiveMessage
$131 100 Hz 10 SQS DeleteMessage
$131 10 Hz 100 S3 PUT
$474 10 Hz 100 DynamoDB item writes
$2177 1000 Hz TOTAL (16%)
SQSQSQSQSQAnalysAnalysAnalysAnalysAnalys SQAnalys SQAnalys SQAnalys SQAnalys SQSAnalysis
Batch Writer
S3
DynamoDB
SQS + S3 + DynamoDBS3_BATCH_SZ=1000
Monthly Cost
Usage Rate
Batch Size Service
$1310 1000 Hz 1 SQS SendMessage
$131 100 Hz 10 SQS ReceiveMessage
$131 100 Hz 10 SQS DeleteMessage
$13 1 Hz 1000 S3 PUT
$474 1 Hz 1000 DynamoDB item writes
$2059 1000 Hz TOTAL (15%)
SQSQSQSQSQAnalysAnalysAnalysAnalysAnalys SQAnalys SQAnalys SQAnalys SQAnalys SQSAnalysis
Batch Writer
S3
DynamoDB
RabbitMQ + S3 + DynamoDB self-managed queue instances
Monthly Cost Usage Rate
Batch Size Service
$4091 1000 Hz2x r3.8xlarge (244GB RAM ea) store 70KB items for ~1hr without exceeding 50% RAM
2x? Double instances to allow for spikes, robustness to failure
$13 1 Hz 1000 S3 PUT
$474 1000 Hz 1000 DynamoDB item writes
>$4578 1000 Hz TOTAL (15%)
SQSQSQSQSQAnalysAnalysAnalysAnalysAnalys SQAnalys SQAnalys SQAnalys SQAnalys SQSAnalysis
Batch Writer
S3
DynamoDB
Basic algorithm
S3
SQS Get ≤10 messages
Batch size reached or flush
timer fired?
No
Build batch file & offset map
PUT batch file
Write offset map
Yes
DynamoDB
Delete messages SQS
Implementation difficulties
S3
SQS
Get ≤10 messages
Batch size reached or flush
timer fired?
No
Build batch file & offset map
PUT batch file
Write offset map
Yes
DynamoDB
Average latency: ~20ms (50/sec/thread)
Latency: 20-200ms (size-dependent)
Latency: <10ms
Single Python processes~50 messages/sec
S3
SQS Get ≤10 messages
Batch size reached or flush
timer fired?
No
Build batch file & offset map
PUT batch file
Write offset map
Yes
DynamoDB
Delete messages SQS
Average latency: ~20ms
Multiple Python processes4 procs = 200 messages/sec SQS
Get ≤10 messages
Build batch file & offset map
PUT batch file
Yes
Write offset map
SQS
Get ≤10 messages
Build batch file & offset map
PUT batch file
Yes
SQS
Get ≤10 messages
Build batch file & offset map
PUT batch file
Yes
Write offset map Write offset map
Get ≤10 messages
Build batch file & offset map
PUT batch file
Yes
Write offset map
SQS
No No No No
Batch size reached or flush
timer fired?
Batch size reached or flush
timer fired?
Batch size reached or flush
timer fired?
Batch size reached or flush
timer fired?
Delete messages Delete messages Delete messages Delete messages
Process-based scaling leads to suboptimal cost performance
• Impossible to scale number of SQS pollers and S3 writers independently
• One batch buffers per process: smaller batches than optimal, hard to “max out” S3 batch size before timeout
• Hard to “max out” 10 messages each SQS read
• Hard to detect when system is falling behind, problematic if write latency > read latency
SQS
Get ≤10 messages
Build batch file & offset map
PUT batch file
Yes
Write offset map
SQS
Get ≤10 messages
Build batch file & offset map
PUT batch file
Yes
SQS
Get ≤10 messages
Build batch file & offset map
PUT batch file
Yes
Write offset map Write offset map
Get ≤10 messages
Build batch file & offset map
PUT batch file
Yes
Write offset map
SQS
No No No No
Batch size reached or flush
timer fired?
Batch size reached or flush
timer fired?
Batch size reached or flush
timer fired?
Batch size reached or flush
timer fired?
Delete messages Delete messages Delete messages Delete messages
Go implementation SQS
Get ≤10 messages
Build batch file & offset map
PUT batch file
Write offset map
SQS
Build batch file & offset map
PUT batch file
SQS
Get ≤10 messages
Build batch file & offset map
PUT batch file
Write offset map Write offset map
S3 DynamoDB
Get ≤10 messages
Build batch file & offset map
PUT batch file
Write offset map
SQS
Batch size reached or flush
timer fired?
SQS message channel
Batch channel
Get ≤10 messages …
…
Concurrency costs money• The concurrency model your language provides is
very important when your code combines lots of high-latency API calls / RPCs
• Ruby, Python, Node.js all require lots of concurrent processes to achieve good concurrency
• Result: over-provisioning, over-polling, IPC when you don’t need to
• Result: suboptimal cost when using usage-priced APIs
Does this apply to me?• Increasingly, yes
• More cores, more cloud, all the time
• SOA, “microservices”
• Do you have code that calls multiple independent services serially?
• Why?
Couldn’t I just use {C, C++, Java, Scala, Clojure, Erlang, Haskell} to achieve multi-core concurrency?
• Yes, but —
• it may still be such a pain to spawn new threads in your language that you don’t do it enough (e.g. Java, C/C++) vs. just typing “go func()”
• C/C++ and Java have pretty heavyweight thread sizes, typically can only support 1K-10K threads
• Go (and Erlang?) have very lightweight thread and can support millions of goroutines
Thank you!
• Hope this was useful and interesting!
• We’re hiring! Backend “big data” engineering roles in Providence, Boston, Vancouver
• http://www.appneta.com/about/careers/
• http://providence.craigslist.org/sof/5001907524.html