python, go, and the cost of concurrency in the cloud | aws re:invent

41
Python, Go, and the Cost of Concurrency in the Cloud

Upload: appneta

Post on 15-Apr-2017

762 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Python, Go, and the Cost of Concurrency in the Cloud

Page 2: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

• Quick discussion of moving from Python to Go

• Explaining key differences between Go and other “no semicolons” languages

• Show an example application illustrating why those key differences matter for your app’s bottom line

Goal of this talk

Page 3: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

• Daily Python hacker

• Came from C/C++, UNIX systems hacking background

• PhD on P2P/crypto research, more C++ & Python

• Six months experience in Go

• Co-founder, Tracelytics; Chief Architect, AppNeta

About meChris Erway, AppNeta Chief Architect

Page 4: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

• Fun to program — “Zen of Python”

• Built-in maps, sets, arrays, tuples

• Good library support

• Simple duck typing (as opposed to strict OO)

• A little code goes a long way

Things I like about Python

Page 5: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

• Performance: not too slow, but not too fast either

• Dependencies can be a pain (virtualenv, pip, etc)

• The dreaded Global Interpreter Lock (GIL)

• Lack of typed function signatures can make reading code difficult

Things I don’t like about Python

Page 6: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

• Announced 2009

• Creators: Ken Thompson (B, Plan 9 from Bell Labs), Rob Pike (Plan 9), Robert Griesemer (V8 engine)

• “all three of us had to be talked into every feature in the language, so there was no extraneous garbage put into the language for any reason”

• Statically typed, garbage-collected

• Fast compilation, static linking

Go

Page 7: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

• Easy to read, no semicolons

• Built-in maps, arrays, strings

• Both support calling into C code when necessary

• Interfaces based on duck typing

• No virtual inheritance

• Statically typed, but automatic type inference and interfaces give it a “dynamic feel”

Similarities between Go and Python

Page 8: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

• Go is compiled to native machine code• Fast compiler, single static binary

• Go is fast; memory usage depends on size of structs• No per-object dictionaries, as in Python

• Go has concurrency features built into the language: goroutines, channels, runtime scheduler

• Go has curly braces

Differences between Go and Python

Page 9: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Language comparison

Page 10: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

• Lots of resources online for learning Go!

• Following 4 slides from “Go for Pythonistas” by Francesc Campoy Flores, Google

• See also: The Go Programming Language Blog, blog.golang.org

For more on Go ...

Page 11: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

• Goroutines: very light processing actors (the gophers)

• Channels: typed, synchronized, thread-safe pipes (the arrows)

Go concurrencyBased on goroutines and channels

Page 12: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Fibonacci with Python generators

Page 13: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

“Generator” goroutinesA function that returns a channel:

Page 14: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Concurrency across languages(From Brad Fitzpatrick’s Gocon Tokyo 2014 talk)

Page 15: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

So who cares?You do!

• You do — concurrency is very important in the modern computing environment

• Programming for “the cloud” or for “SOA” or “microservices” is fundamentally different than writing a LAMP/MEAN/Rails app

• Assumptions on latency, throughput, scale all change

• The language you pick can cost you time & money

Page 16: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Modern systems demand concurrency• Pre-cloud systems and databases generally use pools of long-

living, low-latency connections

• Cloud & SOA/microservices often rely on HTTPS-based APIs• “Infinite” scale, but with more latency • For example: Amazon SQS vs RabbitMQ

• HTTP-based APIs have inherently higher latencies• Amazon DynamoDB 5-10ms latency• Amazon Kinesis PutRecords, S3, SQS 10-100ms latency• Usage-based pricing• Higher throughput = more concurrent HTTP connections

Page 17: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

What about my async code for Python, Ruby, Node?• Async I/O makes network, disk reads & writes asynchronous

• Used by Python’s gevent, Tornado, Twisted• Ruby EventMachine, Celluloid• However Python, Ruby, Node.js

all still use a single-threadedinterpreter

• Interpreter can switch to another execution context/greenlet/thread while I/O is pending

• Go: thousands of goroutines are mapped to all available cores

Page 18: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Cloud APIs require compute-heavy RPCs

• HTTP-based APIs with authenticated JSON/XML

• Encryption: TLS/SSL key exchange, negotiation

• Authentication: AWS, Google request signature schemes

• Serialization: Convert data to JSON, base64, etc

• Not as simple as binary data over raw sockets

• Not pure disk/network I/O — not as easy to use async I/O

Page 19: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Increasing prevalence of multi-core architectures

• Quad-core, 8-core, 16-core, 20-core, 32-core, 40-core …

• How will you use all those CPUs?

• Strong opinion: Docker, containerization sometimesused as a crutch for horizontally scaling services writtenin single-threaded languages

Page 20: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Motivating example

• ~1000 items analyzed each second

• ~1000 Amazon S3 PUTs/sec, ~70KB each

• ~1000 Amazon DynamoDB item writes/sec

Page 21: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Cloud storage, queue, and log costs

Page 22: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Motivating example

• ~1000 items analyzed each second

• ~1000 Amazon S3 PUTs/sec, ~70KB each

• ~1000 Amazon DynamoDB item writes/sec

Page 23: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Why not queue the writes for later?

• ~1000 items analyzed each second

• ~1000 Amazon S3 PUTs/sec, ~70KB each

• ~1000 Amazon DynamoDB item writes/sec

Page 24: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Batch writes for fewer PUTs

• Read data objects from Amazon SQS

• Batch into larger files and store in Amazon S3

Page 25: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Baseline: Amazon SQS + S3 + DynamoDBNo batching

Page 26: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Amazon SQS + S3 + DynamoDBBATCH_SZ=10

Page 27: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Amazon SQS + S3 + DynamoDBBATCH_SZ=100

Page 28: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Amazon SQS + S3 + DynamoDBBATCH_SZ=1000

Page 29: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Basic algorithm

Page 30: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Implementation difficulties

Page 31: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Single Python processes~50 messages/sec

Page 32: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Multiple Python processes4 procs = ~200 messages/sec

Page 33: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Process-based scaling leads to suboptimal cost performance• Impossible to scale number of

Amazon SQS pollers and S3 writers independently

• One batch buffers per process: smaller batches than optimal, hard to “max out” S3 batch size before timeout

• Hard to “max out” 10 messages each SQS read

• Hard to detect when system is falling behind, problematic if write latency > read latency

Page 34: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Go implementation

Page 35: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Concurrency costs money

• The concurrency model your language provides is very important when your code combines lots of high-latency API calls / RPCs

• Ruby, Python, Node.js all require lots of concurrent processes to achieve good concurrency

• Result: over-provisioning, over-polling, IPC when you don’t need to

• Result: suboptimal cost when using usage-priced APIs

Page 36: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Couldn’t I just use {C, C++, Java, Scala, Clojure, Erlang, Haskell} for concurrency?• Yes, but —

• it may still be such a pain to spawn & use threads in your language that you don’t do it enough (e.g. in Java, C/C++ vs. just typing “go func()”)

• Lock-based synchronized memory access more complicated than channels

• C/C++ and Java have pretty heavyweight thread sizes, typically can only support 1K-10K threads

• Go (and Erlang) have very lightweight threads and can support millions of goroutines

Page 37: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Does this apply to me?

• Increasingly, yes

• More cores, more cloud, all the time

• SOA, “microservices”

• Do you have code that calls multiple independent services serially?

• Why?

Page 38: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Thank you!

• Hope this was useful and interesting!

• Win a BB-8 droid at our booth #131!• Next drawing is at 3:15PM today

• AppNeta is hiring! Engineering roles in Providence, Boston, Vancouver

• http://www.appneta.com/about/careers/

Page 39: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Backup slides (if potential Q&A question comes up)

Page 40: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

Amazon Kinesis + S3 + DynamoDBProvisioned throughput per shard + PUT units

Page 41: Python, Go, and the Cost of Concurrency in the Cloud | AWS re:Invent

RabbitMQ + Amazon S3 + DynamoDBself-managed queue instances