building a high throughput rest api with scala
DESCRIPTION
Slides from my talk the Scala DC Meetup on Jan 15th 2014.TRANSCRIPT
Building a high throughput REST API with Scala + Play + Akka
Bhaskar V. Karambelkar https://www.linkedin.com/in/bhaskarvk
https://twitter.com/bhaskar_vk
Scala DC-MD-NOVA meetup Jan-15-2014
Status quo
• APIs used to be built with various protocols such as JDBC (Stored Procs), JMS, SOAP/HTTP, XML-RPC, file transfer.
• Issues – No uniformity
Not firewall friendly
Programming language dependency (JMS)
Not easy to test / document.
Not easy to scale, load-balance, fail-over.
Scala DC-MD-NOVA meetup Jan-15-2014
Why Scala + Play + Akka
• Needed an API that could successfully tackle the 4 Vs of Big Data viz. Volume, Velocity, Variety, Veracity.
• Needed the API to be horizontally as well as vertically scalable.
• Needed an “event driven” architecture/ programming model.
• Needed easy “HA”, “fail-over”, “concurrency”, “load balancing” constructs.
Scala DC-MD-NOVA meetup Jan-15-2014
Stack
• Scala 2.10.3, Play 2.2.1, Akka 2.2.3.
• Eclipse + ScalaIDE (4.0.0 M1)
• Mongo DB as a Config Data Store + Queue
• metrics-scala library for metrics.
• Webjars library to manage javascript/css dependencies.
• sbt for building, jenkins for CI.
Scala DC-MD-NOVA meetup Jan-15-2014
1.0 Architecture
Scala DC-MD-NOVA meetup Jan-15-2014
Architecture Cont.
• Apache Reverse Proxy ( HA, Load Balancing, fail-over, TLS termination).
• API farm gets JSON POSTs , parses JSON , normalized to Scala Objects, uploaded to Mongo DB acting as a Q.
• Same API farm de-queues from Mongo, sends it to next hop in the pipeline.
• A basic admin console written in AngularJS.
• Eventual destination HDFS & Elasticsearch.
Scala DC-MD-NOVA meetup Jan-15-2014
Performance in Production on first run
• Slow JSON parsing, frequent OOMs, or even worse JVM hangs (kill -9).
• No Transactions in MongoDB , so Data Loss in case of crash/hang.
• Not scalable beyond a certain load. • CPUs pegged at 60 to 70% utilization, non-uniform core
usage. • Heap usage high. • I/O bottlenecks. • Heavy en-queuing slowed down de-queuing, so queues
fill up over time.
Scala DC-MD-NOVA meetup Jan-15-2014
Architecture 2.0
Scala DC-MD-NOVA meetup Jan-15-2014
Architecture 2.0 Cont.
• Dedicated Pipelines for clients. • Separate heavy traffic from light traffic. • Separate enqueue and de-queue in to dedicated
API Server instances. • Compression all the way, even in Mongo. • Incremental JSON Parsing. • Avoid unnecessary JSON->Object->BSON-
>Object->Stream. • Changed logic so as to not lose data even in the
event of an instance crash/hang.
Scala DC-MD-NOVA meetup Jan-15-2014
Results
• Platform Stable
• CPU usage steady @ 30 to 40 %, with uniform distribution across cores.
• Memory consumption under control, no more OOM / hanging.
• Increased Throughput and scalability.
• Very easy to increase scaling, create more data paths.
Scala DC-MD-NOVA meetup Jan-15-2014
Buzzwords/Recommendations
• Scala – Immutability every where, Use case classes / immutable
collections. – Monadic Patterns everywhere ( Collections, Try, Option) .
• Akka – prefer ! (tell) Over ? (ask) – Tune Dispatcher parameters, don’t rely on default dispatcher. – Give Scheduler its own dispatcher. – Routers with own dispatcher for load-balancing actors writing to
destinations. – CircuitBreaker to prevent cascading failures. – Throttler Actor for throttling when required.
Scala DC-MD-NOVA meetup Jan-15-2014
Buzzwords/Recommendations
• Play
– Prefer non-blocking/async calls whenever possible.
– Use webjars for managing javascript/css dependency.
– For huge JSONs use incremental JSON parser + Play’s Iteratee f/w.
• JVM
– Use Java 7.
– Profile and tune GC and memory params.
Scala DC-MD-NOVA meetup Jan-15-2014
Some Numbers
• Current Load
– 2.5 Billion events / day ( > 30 K/sec sustained).
– 2 to 3 TB / day.
– Expected to grow by 5x to 10x.
• Current h/w count
– 2 Data Paths with 4 enqueue and 4 de-queue API servers in each path.
Scala DC-MD-NOVA meetup Jan-15-2014
Future …
• Waiting for Typesafe platform to stabilize a bit (akka-io, spray, akka-cluster)
• More reactive than current implementation (Play Futures, Iteratees)
• Reactive Mongo (currently we use Casbah).
• Evaluating Scala for use in the analytics pipeline (spark f/w, cascading).
Scala DC-MD-NOVA meetup Jan-15-2014
Thank You !
• Questions ?, Comments ?
Scala DC-MD-NOVA meetup Jan-15-2014