Using Elixir to scale Video User Profile
Service
Emerson Macedo @emerleite
CONTEXT
User Profile API was developed in 2012.
It’s a well written Ruby on Rails
application , responsible for track
logged user actions
CONTEXT
User Profile API was developed in 2012.
It’s a well written Ruby on Rails
application , responsible for track
logged user actions
CONTEXT
Our Video Player POST Watched
Percentage to our Endpoint so we can
provide Keep Watching Percentage
to our Logged Users. It does it
every 10 seconds
CONTEXT
Our Video Player POST Watched
Percentage to our Endpoint so we can
provide Keep Watching Percentage
to our Logged Users. It does it
every 10 secondsx
x
CONTEXT
Globo is making Binge Watching experiments combined
with Online First for new TV Shows
TV Shows
CONTEXT
Prepare all Video Applications for the
Olympics
Online First Binge Watching
THE PROBLEM
2013
10k
The throughput increased
between 2013 and 2016
2013 2016
10k
60kTHE PROBLEM
The throughput increased
between 2013 and 2016
2013 2016
10k
60kTHE PROBLEM
The throughput increased
between 2013 and 2016 ~600%
2013 2016
10k
60kTHE PROBLEM
It was hard to predict the new
Appication Throughput
2016 / 2
?
INFRASTRUCTURE
2 bare metals, each one with 24 CPUs and 64GB of RAM
THE PROBLEM
The average response time
was good, but percentiles were hurting the application
Avg
70ms
70ms
THE PROBLEM
The average response time
was good, but percentiles were hurting the application
99
2.5s
95
1s
Avg
FIRST CHANGE
We increased the computational resources
from 2 to 4 bare metals
FIRST CHANGE
We increased the
computational resources
from 2 to 4 bare metals
FIRST CHANGE
We increased the
computational resources
from 2 to 4 bare metals Each one with 24 CPUs and 64GB of RAM
70ms
99
2.5s
95
1s
Avg
FIRST CHANGE
This change improved
metrics by ~32%
FIRST CHANGE
This change improved
metrics by ~32%
9995
1.7s
0.7s
Avg
47ms
FIRST CHANGE
This change improved
metrics by ~32%
9995
1.7s
0.7s
Avg
47ms
~32% better
SECOND CHANGE
We decided to try a deployment with Tsuru containers with auto scaling, each one with 1-4 vCPU and 2GB of RAM
SECOND CHANGE
We decided to try a deployment with Tsuru containers with auto scaling, each one with 1-4 vCPU and 2GB of RAM
93 containers
SECOND CHANGE
9995
1.7s
0.7s
Avg
47ms
Migrate to containers improved
metrics by ~12%
SECOND CHANGE
9995
1.5s
0.5s
Avg
41ms
Migrate to containers improved
metrics by ~12%
SECOND CHANGE
9995
1.5s
0.5s
Avg
41ms
Migrate to containers improved
metrics by ~12%
~12% better
CONTEXT
Our Video Player POST Watched
Percentage to our Endpoint so we can
provide Keep Watching Percentage
to our Logged Users. It does it
every 10 seconds
CONTEXT
Our Video Player POST Watched
Percentage to our Endpoint so we can
provide Keep Watching Percentage
to our Logged Users. It does it
every 10 secondsx
x
Online First Binge Watching
ARCHITECTURAL OVERVIEW
User Profile API is a
classical Ruby on Rails
application, which also
uses Resque for
background jobs
ARCHITECTURAL OVERVIEW
User Profile API is a
classical Ruby on Rails
application, which also
uses Resque for
background jobs
ARCHITECTURAL OVERVIEW
User Profile API is a
classical Ruby on Rails
application, which also
uses Resque for
background jobs
BLOCK
BLOCK
BLOCK
ARCHITECTURAL OVERVIEW
User Profile API is a
classical Ruby on Rails
application, which also
uses Resque for
background jobs
CONTAINER DISTRIBUTION
Our containers was distributed for the Rails App, Resque Workers and Resque Scheduler
93 containers
30 containers63 containers
30 containers63 containers
ResqueRails
THIRD CHANGE
We saw that just ONE
endpoint was responsible for
80%, of application
throughput
THIRD CHANGE
We saw that just ONE
endpoint was responsible for
80%, of application
throughput
THIRD CHANGE
x
x
We saw that just ONE
endpoint was responsible for
80%, of application
throughput
CONTEXT
Our Video Player POST Watched
Percentage to our Endpoint so we can
provide Keep Watching Percentage
to our Logged Users. It does it
every 10 seconds
CONTEXT
Our Video Player POST Watched
Percentage to our Endpoint so we can
provide Keep Watching Percentage
to our Logged Users. It does it
every 10 secondsx
x
THIRD CHANGE
We saw that just ONE endpoint was
responsible for 80%, of application throughput
THIRD CHANGE
We saw that just ONE endpoint was
responsible for 80%, of application throughput
Pareto
THIRD CHANGE
We saw that just ONE endpoint was responsible for 80%, of application throughput
Pareto
20% of effort will solve 80% of the problem
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
x
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
x
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
x
WHY ELIXIR
Elixir is the language Java programmers were looking when they choose Ruby
WHY ELIXIR
Elixir is the language Java programmers were looking when they choose Ruby
x
x
x
WHY ELIXIR
Elixir generates Erlang byte code and runs
on BEAM VM which has 30 years of development
WHY ELIXIR
Elixir generates Erlang byte code and runs
on BEAM VM which has 30 years of development
x
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
SAME APPLICATION
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
SAME APPLICATION
ERL PROCESS
ERL PROCESS
THIRD CHANGE
The result was a starting
point to change the app to
CQRS Architecture
THIRD CHANGE
The result was a starting
point to change the app to
CQRS Architecture
COMMAND
QUERY
THIRD CHANGE
9995
1.5s
0.5s
Avg
41ms
Migrate to Elixir improved
metrics by ~95%
THIRD CHANGE
9995
30ms15ms
Avg
4ms
Migrate to Elixir improved
metrics by ~95%
THIRD CHANGE
9995
30ms15ms
Avg
4ms
Migrate to Elixir improved
metrics by ~95%
~95% better
30 containers63 containers
ResqueRails
THIRD CHANGE
Migrate to Elixir reduced
containers by ~35%
3 containers30 containers
ElixirRuby
THIRD CHANGE
Migrate to Elixir reduced
containers by ~35%
33 containers
TOOLS
We chosse phoenix
framework to create
our Elixir API and
we’re using many other
community libs
TOOLS
We chosse phoenix
framework to create
our Elixir API and
we’re using many other
community libs
Ecto HTTPoison
Exrm CacheX
GenRetry FakeServer
Corsica
PROBLEMS
MongoDB Driver did not has
Replica Sets support. We had
to implement it
PROBLEMS
MongoDB Driver did not has
Replica Sets support. We had
to implement it
http//github.com/emerleite/mongox
http//github.com/emerleite/mongox_ecto
PROBLEMS
Elixir did not has NewRelic support.
We need to create an ad-hoc implementation using Exometer
PROBLEMS
Elixir did not has NewRelic support.
We need to create an ad-hoc implementation using Exometer
https://github.com/Feuerlabs/exometer_core
2013 2016
10k
60kFINAL RESULT
After the Olympics and with
Binge Watching, the
throughput increased ~50%
2013 2016
10k
60kFINAL RESULT
After the Olympics and with
Binge Watching, the
throughput increased ~50%
2016 / 2
90k
70ms
99
2.5s
95
1s
Avg
FINAL RESULT
Migrate to Elixir improved
metrics by ~95%
FINAL RESULT
9995
30ms15ms
Avg
4ms
Migrate to Elixir improved
metrics by ~95%
FINAL RESULT
9995
30ms15ms
Avg
4ms
Migrate to Elixir improved
metrics by ~95%
~95% better
Perguntas?
Emerson Macedo@emerleite
https://blog.emerleite.com