java one2013

27
Distributed & highly available server applications in Java and Scala Max Alexejev, Aleksei Kornev JavaOne Moscow 2013 24 April 2013

Upload: aleksei-kornev

Post on 22-Apr-2015

750 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Java one2013

Distributed & highly available server applications in Java and Scala

Max Alexejev, Aleksei KornevJavaOne Moscow 2013

24 April 2013

Page 2: Java one2013

What is talkbits?

Page 3: Java one2013

Architecture

by Max Alexejev

Page 4: Java one2013

Lightweight SOA

Key principles

● S1, S2 - edge services ● Each service is 0..1 servers

and 0..N clients built together● No special "broker" services● All services are stateless● All instances are equal

What about state?

State is kept is specialized distributed systems and fronted by specific services.

Example follows...

Page 5: Java one2013

Case study: Talkbits backend

Recursive call

Page 6: Java one2013

Requirements for a distrubuted RPC systemMust have and nice to have

● Elastic and reliable discovery - schould handle nodes brought up and shut down transparently and not be a SPOF itself

● Support for N-N topology of client and server instances● Disconnect detection and transparent reconnects● Fault tolerance - for example, by retries to remaining instances where

called instance goes down● Clients backoff built-in - i.e., clients should not overload servers

when load spikes - as far as possible● Configurable load distribution - i.e., which server instance to call for

this specific request● Configurable networking layer - keepalives & heartbeats, timeouts,

connection pools etc.)● Distributed tracing facilities● Portability among different platforms● Distributed stack traces for exceptions● Transactions

Page 7: Java one2013

Key principles to be lightweight and get rid of architectural waste

● Java SE● No containers. Even servlet containers are light and built-in● Standalone applications: unified configuration, deployment, metrics,

logging, single development framework - more on this later● All launched istances are equal and process requests - no "special"

nodes or "active-standby" patterns● Minimal dependencies and JAR size● Minimal memory footprint● One service - one purpose● Highly tuned for this one purpose (app, JVM, OS, HW)● Isolated fault domains - i.e., single datasource or external service is

fronted by one service only

No bloatware in technology stack!

"Lean" services

Page 8: Java one2013

Finagle library

(twitter.github.io/finagle) acts as a distributed RPC framework.

Services are written in Java and Scala and use Thrift communication protocol.

Talkbits implementation choices

Apache Zookeeper (zookeeper.apache.org)

Provides reliable service discovery mechanics. Finagle has a nice built-in integration with Zookeeper.

Page 9: Java one2013

Finagle server: networking

Finagle is built on top of Netty - asynchronous, non-blocking TCP server.

Finagle codectrait Codec[Req, Rep]

class ThriftClientFramedCodec(...) extends Codec[ThriftClientRequest, Array[Byte]] { pipeline.addLast("thriftFrameCodec", new ThriftFrameCodec) pipeline.addLast("byteEncoder", new ThriftClientChannelBufferEncoder) pipeline.addLast("byteDecoder", new ThriftChannelBufferDecoder) ...}

Finagle comes with ready-made codecs for Thrift, HTTP, Memcache, Kestrel, HTTP streaming.

Page 10: Java one2013

Finagle services and filters

// Service is simply a function from request to a future of response.trait Service[Req, Rep] extends (Req => Future[Rep])

// Filter[A, B, C, D] converts a Service[C, D] to a Service[A, B].abstract class Filter[-ReqIn, +RepOut, +ReqOut, -RepIn] extends ((ReqIn, Service[ReqOut, RepIn]) => Future[RepOut])

abstract class SimpleFilter[Req, Rep] extends Filter[Req, Rep, Req, Rep]

// Service transformation exampleval serviceWithTimeout: Service[Req, Rep] = new RetryFilter[Req, Rep](..) andThen new TimeoutFilter[Req, Rep](..) andThen service

Finagle comes with rate limiting, retries, statistics, tracing, uncaught exceptions handling, timeouts and more.

Page 11: Java one2013

Functional composition

Given Future[A]

Sequential compositiondef map[B](f: A => B): Future[B]

def flatMap[B](f: A => Future[B]): Future[B]

def rescue[B >: A](rescueException: PartialFunction[Throwable, Future[B]]): Future[B]

Concurrent compositiondef collect[A](fs: Seq[Future[A]]): Future[Seq[A]]

def select[A](fs: Seq[Future[A]]): Future[(Try[A], Seq[Future[A]])]

And more

times(), whileDo() etc.

Page 12: Java one2013

Functional composition on RPC calls

Sequential compositionval nearestChannel: Future[Channel] = metadataClient.getUserById(uuid) flatMap { user => geolocationClient.getNearestChannelId( user.getLocation() ) } flatMap { channelId => metadataClient.getChannelById( channelId ) }

Concurrent compositionval userF: Future[User] = metadataClient.getUserById(uuid)val bitsCountF: Future[Integer] = metadataClient.getUserBitsCount(uuid)val avatarsF: Future[List[Avatar]] = metadataClient.getUserAvatars(uuid)

val(user, bitsCount, avatars) = Future.collect(Seq(userF, bitsCountF, avatarsF)).get()

*All this stuff works in Java just like in Scala, but does not look as cool.

Page 13: Java one2013

Finagle server: threading model

You should never block worker threads in order to achieve high performance (throughput).

For blocking IO or long compuntations, delegate to FuturePool.val diskIoFuturePool = FuturePool(Executors.newFixedThreadPool(4))

diskIoFuturePool( { scala.Source.fromFile(..) } )

Boss thread accepts new client connections and binds NIO Channel to a specific worker thread.

Worker threads perform all client IO.

Page 14: Java one2013

More gifts and bonuses from Finagle

In addition to all said before, Finagle has

● Load-distribution in N-N topos - HeapBalancer ("least active connections") by default

● Client backoff strategies - comes with TruncatedBinaryBackoff implementation

● Failure detection● Failover/Retry● Connection Pooling● Distributed Tracing (Zipkin project based on Google Dapper paper)

Page 15: Java one2013

Finagle, Thrift & Java: lessons learned

Pros

● Gives a lot out of the box● Production-proven and stable● Active development community● Lots of extension points in the library

Cons

● Good for Scala, usable with Java● Works well with Thrift and HTTP (plus trivial protocols), but lacks

support for Protobuf and other stuff● Poor exceptions handling experience with Java (no Scala match-es)

and ugly code● finagle-thrift is a pain (old libthrift version lock-in, Cassandra

dependencies clash, cannot return nulls, and more). All problems avoidable thought.

● Cluster scatters and never gathers when whole Zookeeper ensemble is down.

Page 16: Java one2013

Finagle: competitors & alternatives

Trending

● Akka 2.0 (Scala, OpenSource) by Typesafe● ZeroRPC (Python & Node.js, OpenSource) by DotCloud● RxJava (Java, OpenSource) by Netflix

Old

● JGroups (Java, OpenSource)● JBOSS Remoting (Java, OpenSource) by JBOSS● Spread Toolkit (C/C++, Commercial & OpenSource)

Page 17: Java one2013

Configuration, deployment, monitoring and logging

by Aleksei Kornev

Page 18: Java one2013

Get stuff done...

Page 19: Java one2013

Typical application

Page 20: Java one2013

Architecture of talkbits service

One way to configure service, logs, metrics.

One way to package and deploy service.

One way to lunch service.

Bundled in one-jar.

Page 21: Java one2013

One delivery unit. Contains:

Java service

In a single executable fat-jar.

Installation script

[Re]installs service on the machine, registers it in /etc/init.d

Init.d script

Contains instructions to start, stop, restart JVM and get quick status.

Delivery

Page 22: Java one2013

Logging

Confuguration● SLF4J as an API, all other libraries redirected● Logback as a logging implementation● Each service logs to /var/log/talkbits/... (application logs, GC logs)● Daily rotation policy applied● Also sent to loggly.com for aggregation, grouping etc.Aggregation● loggly.com● sshfs for analyzing logs by means of linux tools such as grep, tail, less,

etc.Aggregation alternativesSplunk.com, Flume, Scribe, etc...

Page 23: Java one2013

Metrics

Application metrics and health checks are implemented with CodaHale lib (metrics.codahale.com). Codahale reports metrics via JMX. Jolokia JVM agent (www.jolokia.org/agent/jvm.html) exposes JMX beans via REST (JSON / HTTP), using JVMs internal HTTP server.Monitoring agent use jolokia REST interface to fetch metrics and send them to monitoring system.All metrics are divided into common metrics (HW, JVM, etc) and service-specific metrics.

Page 24: Java one2013

Deployment

Fabric (http://fabfile.org) used for environments provisioning and services deployment.Process● Fabric script provisions new env

(or uses existing) by cluster scheme

● Amazon instances are automatically tagged with services list (i.e., instance roles)

● Fabric script reads instance roles and deploys (redeploys) appropriate components.

Page 25: Java one2013

MonitoringAs monitoring platform we chose Datadoghq.com. Datadog is a SaaS which is easy to integrate into your infrastucture. Datadog agent is opensourced and implemented in Python. There are many predefined checksets (plugins, or integrations) for popular products out of the box - including JVM, Cassandra, Zookeeper and ElasticSearch.Datadog provides REST API.Alternatives

● Nagios, Zabbix - need to have bearded admin in team. We wanted to go SaaS and outsource infrastructure as far as possible.

● Amazon CloudWatch, LogicMonitor, ManageEngine, etc.ProcessEach service has own monitoring agent instance on a single machine. If node has 'monitoring-agent' role in the roles tag of EC2 instance, monitoring agent will be installed for each service on this node.

Page 26: Java one2013

Talkbits cluster structure

Page 27: Java one2013

QA

Max AlexejevHTTP://RU.LINKEDIN.COM/PUB/MAX-ALEXEJEV/51/820/AB9http://www.slideshare.net/MaxAlexejev/[email protected]

Aleksei [email protected]