how elixir helped us scale our video user profile service for the olympics

84
Using Elixir to scale Video User Profile Service Emerson Macedo @emerleite

Upload: emerson-macedo

Post on 16-Feb-2017

260 views

Category:

Technology


1 download

TRANSCRIPT

Using Elixir to scale Video User Profile

Service

Emerson Macedo @emerleite

CONTEXT

User Profile API was developed in 2012.

It’s a well written Ruby on Rails

application , responsible for track

logged user actions

CONTEXT

User Profile API was developed in 2012.

It’s a well written Ruby on Rails

application , responsible for track

logged user actions

CONTEXT

Our Video Player POST Watched

Percentage to our Endpoint so we can

provide Keep Watching Percentage

to our Logged Users. It does it

every 10 seconds

CONTEXT

Our Video Player POST Watched

Percentage to our Endpoint so we can

provide Keep Watching Percentage

to our Logged Users. It does it

every 10 secondsx

x

CONTEXT

Globo is making Binge Watching experiments combined

with Online First for new TV Shows

TV Shows

CONTEXT

Prepare all Video Applications for the

Olympics

Online First Binge Watching

THE PROBLEM

2013

10k

The throughput increased

between 2013 and 2016

2013 2016

10k

60kTHE PROBLEM

The throughput increased

between 2013 and 2016

2013 2016

10k

60kTHE PROBLEM

The throughput increased

between 2013 and 2016 ~600%

2013 2016

10k

60kTHE PROBLEM

It was hard to predict the new

Appication Throughput

2016 / 2

?

INFRASTRUCTURE

2 bare metals, each one with 24 CPUs and 64GB of RAM

THE PROBLEM

The average response time

was good, but percentiles were hurting the application

Avg

70ms

70ms

THE PROBLEM

The average response time

was good, but percentiles were hurting the application

99

2.5s

95

1s

Avg

FIRST CHANGE

We increased the computational resources

from 2 to 4 bare metals

FIRST CHANGE

We increased the

computational resources

from 2 to 4 bare metals

FIRST CHANGE

We increased the

computational resources

from 2 to 4 bare metals Each one with 24 CPUs and 64GB of RAM

70ms

99

2.5s

95

1s

Avg

FIRST CHANGE

This change improved

metrics by ~32%

FIRST CHANGE

This change improved

metrics by ~32%

9995

1.7s

0.7s

Avg

47ms

FIRST CHANGE

This change improved

metrics by ~32%

9995

1.7s

0.7s

Avg

47ms

~32% better

SECOND CHANGE

We decided to try a deployment with Tsuru containers with auto scaling, each one with 1-4 vCPU and 2GB of RAM

SECOND CHANGE

We decided to try a deployment with Tsuru containers with auto scaling, each one with 1-4 vCPU and 2GB of RAM

93 containers

SECOND CHANGE

9995

1.7s

0.7s

Avg

47ms

Migrate to containers improved

metrics by ~12%

SECOND CHANGE

9995

1.5s

0.5s

Avg

41ms

Migrate to containers improved

metrics by ~12%

SECOND CHANGE

9995

1.5s

0.5s

Avg

41ms

Migrate to containers improved

metrics by ~12%

~12% better

CONTEXT

Our Video Player POST Watched

Percentage to our Endpoint so we can

provide Keep Watching Percentage

to our Logged Users. It does it

every 10 seconds

CONTEXT

Our Video Player POST Watched

Percentage to our Endpoint so we can

provide Keep Watching Percentage

to our Logged Users. It does it

every 10 secondsx

x

Online First Binge Watching

ARCHITECTURAL OVERVIEW

User Profile API is a

classical Ruby on Rails

application, which also

uses Resque for

background jobs

ARCHITECTURAL OVERVIEW

User Profile API is a

classical Ruby on Rails

application, which also

uses Resque for

background jobs

ARCHITECTURAL OVERVIEW

User Profile API is a

classical Ruby on Rails

application, which also

uses Resque for

background jobs

BLOCK

BLOCK

BLOCK

ARCHITECTURAL OVERVIEW

User Profile API is a

classical Ruby on Rails

application, which also

uses Resque for

background jobs

CONTAINER DISTRIBUTION

Our containers was distributed for the Rails App, Resque Workers and Resque Scheduler

93 containers

30 containers63 containers

30 containers63 containers

ResqueRails

THIRD CHANGE

We saw that just ONE

endpoint was responsible for

80%, of application

throughput

THIRD CHANGE

We saw that just ONE

endpoint was responsible for

80%, of application

throughput

THIRD CHANGE

x

x

We saw that just ONE

endpoint was responsible for

80%, of application

throughput

CONTEXT

Our Video Player POST Watched

Percentage to our Endpoint so we can

provide Keep Watching Percentage

to our Logged Users. It does it

every 10 seconds

CONTEXT

Our Video Player POST Watched

Percentage to our Endpoint so we can

provide Keep Watching Percentage

to our Logged Users. It does it

every 10 secondsx

x

THIRD CHANGE

We saw that just ONE endpoint was

responsible for 80%, of application throughput

THIRD CHANGE

We saw that just ONE endpoint was

responsible for 80%, of application throughput

Pareto

THIRD CHANGE

We saw that just ONE endpoint was responsible for 80%, of application throughput

Pareto

20% of effort will solve 80% of the problem

THIRD CHANGE

We rewrote the Ruby POST

endpoint from scratch to an

Elixir version

THIRD CHANGE

We rewrote the Ruby POST

endpoint from scratch to an

Elixir version

THIRD CHANGE

We rewrote the Ruby POST

endpoint from scratch to an

Elixir version

WHY ELIXIR

Elixir is being the

cutting-edge for the

Ruby community

WHY ELIXIR

Elixir is being the

cutting-edge for the

Ruby community

WHY ELIXIR

Elixir is being the

cutting-edge for the

Ruby community

x

WHY ELIXIR

Elixir is being the

cutting-edge for the

Ruby community

WHY ELIXIR

Elixir is being the

cutting-edge for the

Ruby community

x

WHY ELIXIR

Elixir is being the

cutting-edge for the

Ruby community

WHY ELIXIR

Elixir is being the

cutting-edge for the

Ruby community

x

WHY ELIXIR

Elixir is the language Java programmers were looking when they choose Ruby

WHY ELIXIR

Elixir is the language Java programmers were looking when they choose Ruby

x

x

x

WHY ELIXIR

Elixir generates Erlang byte code and runs

on BEAM VM which has 30 years of development

WHY ELIXIR

Elixir generates Erlang byte code and runs

on BEAM VM which has 30 years of development

x

THIRD CHANGE

We rewrote the Ruby POST

endpoint from scratch to an

Elixir version

THIRD CHANGE

We rewrote the Ruby POST

endpoint from scratch to an

Elixir version

THIRD CHANGE

We rewrote the Ruby POST

endpoint from scratch to an

Elixir version

THIRD CHANGE

We rewrote the Ruby POST

endpoint from scratch to an

Elixir version

SAME APPLICATION

THIRD CHANGE

We rewrote the Ruby POST

endpoint from scratch to an

Elixir version

SAME APPLICATION

ERL PROCESS

ERL PROCESS

THIRD CHANGE

The result was a starting

point to change the app to

CQRS Architecture

THIRD CHANGE

The result was a starting

point to change the app to

CQRS Architecture

COMMAND

QUERY

THIRD CHANGE

9995

1.5s

0.5s

Avg

41ms

Migrate to Elixir improved

metrics by ~95%

THIRD CHANGE

9995

30ms15ms

Avg

4ms

Migrate to Elixir improved

metrics by ~95%

THIRD CHANGE

9995

30ms15ms

Avg

4ms

Migrate to Elixir improved

metrics by ~95%

~95% better

30 containers63 containers

ResqueRails

THIRD CHANGE

Migrate to Elixir reduced

containers by ~35%

3 containers30 containers

ElixirRuby

THIRD CHANGE

Migrate to Elixir reduced

containers by ~35%

33 containers

TOOLS

We chosse phoenix

framework to create

our Elixir API and

we’re using many other

community libs

TOOLS

We chosse phoenix

framework to create

our Elixir API and

we’re using many other

community libs

Ecto HTTPoison

Exrm CacheX

GenRetry FakeServer

Corsica

PROBLEMS

MongoDB Driver did not has

Replica Sets support. We had

to implement it

PROBLEMS

MongoDB Driver did not has

Replica Sets support. We had

to implement it

http//github.com/emerleite/mongox

http//github.com/emerleite/mongox_ecto

PROBLEMS

Elixir did not has NewRelic support.

We need to create an ad-hoc implementation using Exometer

PROBLEMS

Elixir did not has NewRelic support.

We need to create an ad-hoc implementation using Exometer

https://github.com/Feuerlabs/exometer_core

2013 2016

10k

60kFINAL RESULT

After the Olympics and with

Binge Watching, the

throughput increased ~50%

2013 2016

10k

60kFINAL RESULT

After the Olympics and with

Binge Watching, the

throughput increased ~50%

2016 / 2

90k

70ms

99

2.5s

95

1s

Avg

FINAL RESULT

Migrate to Elixir improved

metrics by ~95%

FINAL RESULT

9995

30ms15ms

Avg

4ms

Migrate to Elixir improved

metrics by ~95%

FINAL RESULT

9995

30ms15ms

Avg

4ms

Migrate to Elixir improved

metrics by ~95%

~95% better

Perguntas?

Emerson Macedo@emerleite

https://blog.emerleite.com