building a high throughput rest api with scala

15
Building a high throughput REST API with Scala + Play + Akka Bhaskar V. Karambelkar https://www.linkedin.com/in/bhaskarvk https://twitter.com/bhaskar_vk Scala DC-MD-NOVA meetup Jan-15-2014

Upload: bhaskar-karambelkar

Post on 10-May-2015

7.320 views

Category:

Technology


3 download

DESCRIPTION

Slides from my talk the Scala DC Meetup on Jan 15th 2014.

TRANSCRIPT

Page 1: Building a high throughput rest api with scala

Building a high throughput REST API with Scala + Play + Akka

Bhaskar V. Karambelkar https://www.linkedin.com/in/bhaskarvk

https://twitter.com/bhaskar_vk

Scala DC-MD-NOVA meetup Jan-15-2014

Page 2: Building a high throughput rest api with scala

Status quo

• APIs used to be built with various protocols such as JDBC (Stored Procs), JMS, SOAP/HTTP, XML-RPC, file transfer.

• Issues – No uniformity

Not firewall friendly

Programming language dependency (JMS)

Not easy to test / document.

Not easy to scale, load-balance, fail-over.

Scala DC-MD-NOVA meetup Jan-15-2014

Page 3: Building a high throughput rest api with scala

Why Scala + Play + Akka

• Needed an API that could successfully tackle the 4 Vs of Big Data viz. Volume, Velocity, Variety, Veracity.

• Needed the API to be horizontally as well as vertically scalable.

• Needed an “event driven” architecture/ programming model.

• Needed easy “HA”, “fail-over”, “concurrency”, “load balancing” constructs.

Scala DC-MD-NOVA meetup Jan-15-2014

Page 4: Building a high throughput rest api with scala

Stack

• Scala 2.10.3, Play 2.2.1, Akka 2.2.3.

• Eclipse + ScalaIDE (4.0.0 M1)

• Mongo DB as a Config Data Store + Queue

• metrics-scala library for metrics.

• Webjars library to manage javascript/css dependencies.

• sbt for building, jenkins for CI.

Scala DC-MD-NOVA meetup Jan-15-2014

Page 5: Building a high throughput rest api with scala

1.0 Architecture

Scala DC-MD-NOVA meetup Jan-15-2014

Page 6: Building a high throughput rest api with scala

Architecture Cont.

• Apache Reverse Proxy ( HA, Load Balancing, fail-over, TLS termination).

• API farm gets JSON POSTs , parses JSON , normalized to Scala Objects, uploaded to Mongo DB acting as a Q.

• Same API farm de-queues from Mongo, sends it to next hop in the pipeline.

• A basic admin console written in AngularJS.

• Eventual destination HDFS & Elasticsearch.

Scala DC-MD-NOVA meetup Jan-15-2014

Page 7: Building a high throughput rest api with scala

Performance in Production on first run

• Slow JSON parsing, frequent OOMs, or even worse JVM hangs (kill -9).

• No Transactions in MongoDB , so Data Loss in case of crash/hang.

• Not scalable beyond a certain load. • CPUs pegged at 60 to 70% utilization, non-uniform core

usage. • Heap usage high. • I/O bottlenecks. • Heavy en-queuing slowed down de-queuing, so queues

fill up over time.

Scala DC-MD-NOVA meetup Jan-15-2014

Page 8: Building a high throughput rest api with scala

Architecture 2.0

Scala DC-MD-NOVA meetup Jan-15-2014

Page 9: Building a high throughput rest api with scala

Architecture 2.0 Cont.

• Dedicated Pipelines for clients. • Separate heavy traffic from light traffic. • Separate enqueue and de-queue in to dedicated

API Server instances. • Compression all the way, even in Mongo. • Incremental JSON Parsing. • Avoid unnecessary JSON->Object->BSON-

>Object->Stream. • Changed logic so as to not lose data even in the

event of an instance crash/hang.

Scala DC-MD-NOVA meetup Jan-15-2014

Page 10: Building a high throughput rest api with scala

Results

• Platform Stable

• CPU usage steady @ 30 to 40 %, with uniform distribution across cores.

• Memory consumption under control, no more OOM / hanging.

• Increased Throughput and scalability.

• Very easy to increase scaling, create more data paths.

Scala DC-MD-NOVA meetup Jan-15-2014

Page 11: Building a high throughput rest api with scala

Buzzwords/Recommendations

• Scala – Immutability every where, Use case classes / immutable

collections. – Monadic Patterns everywhere ( Collections, Try, Option) .

• Akka – prefer ! (tell) Over ? (ask) – Tune Dispatcher parameters, don’t rely on default dispatcher. – Give Scheduler its own dispatcher. – Routers with own dispatcher for load-balancing actors writing to

destinations. – CircuitBreaker to prevent cascading failures. – Throttler Actor for throttling when required.

Scala DC-MD-NOVA meetup Jan-15-2014

Page 12: Building a high throughput rest api with scala

Buzzwords/Recommendations

• Play

– Prefer non-blocking/async calls whenever possible.

– Use webjars for managing javascript/css dependency.

– For huge JSONs use incremental JSON parser + Play’s Iteratee f/w.

• JVM

– Use Java 7.

– Profile and tune GC and memory params.

Scala DC-MD-NOVA meetup Jan-15-2014

Page 13: Building a high throughput rest api with scala

Some Numbers

• Current Load

– 2.5 Billion events / day ( > 30 K/sec sustained).

– 2 to 3 TB / day.

– Expected to grow by 5x to 10x.

• Current h/w count

– 2 Data Paths with 4 enqueue and 4 de-queue API servers in each path.

Scala DC-MD-NOVA meetup Jan-15-2014

Page 14: Building a high throughput rest api with scala

Future …

• Waiting for Typesafe platform to stabilize a bit (akka-io, spray, akka-cluster)

• More reactive than current implementation (Play Futures, Iteratees)

• Reactive Mongo (currently we use Casbah).

• Evaluating Scala for use in the analytics pipeline (spark f/w, cascading).

Scala DC-MD-NOVA meetup Jan-15-2014

Page 15: Building a high throughput rest api with scala

Thank You !

• Questions ?, Comments ?

Scala DC-MD-NOVA meetup Jan-15-2014