surge 2010 - from disaster to stability - scaling my.opera.com

Post on 06-Dec-2014

1.410 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

from disaster to stabilitythe scaling challenges of my.opera.com

Surge 2010 – Version 3

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 20101 10 50 257 205 430

8871,640

2,500

5,500ServerskUsers

1999

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 20101 10 50 257 205 430

8871,640

2,500

5,500ServerskUsers

2001

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 20101 10 50 257 205 430

8871,640

2,500

5,500ServerskUsers

2004

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 20101 10 50 257 205 430

8871,640

2,500

5,500ServerskUsers

2007

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 20101 10 50 257 205 430

8871,640

2,500

5,500ServerskUsers

2009

the current beta

the situation2007

crashes every day

too many connections!!!

Team?

NFS volume of doom

monitoring

➔ Efficient filesystem cache

➔ "Dogpile effect" AKA stampeding AKA ...

➔ Persistent db + memcached connections

➔ Soft counters

➔ Profiling, profiling, …

many improvements since then

code profiling[DML] time=1237308152, user=, url=/tinh_yeu_cua_anh_b88/blog/index.dml/tag/...,name=XWA::User, variable=active, type=module, elapsed=0.068473, host=my.opera.com[DML] time=1237308152, user=, url=/community/,name=XWA::User, variable=, type=module, elapsed=0.015935, host=my.opera.com[DML] ...

top time-intensive modules

XWA::User::Sidebar 2024.919s (27.2%, 0.28 s/call)XWA::User 1778.445s (23.9%, 0.09 s/call)XWA::User::Journal 1121.224s (15.1%, 0.24 s/call)XWA::User::Album 321.522s ( 4.3%, 0.17 s/call)XWA::User::Journal::Search 223.477s ( 3.0%, 20.32 s/call)XWA::User::Comments 188.011s ( 2.5%, 0.05 s/call)XWA::Skins 180.486s ( 2.4%, 0.49 s/call)XWA::User::JournalArchive 159.525s ( 2.1%, 4.43 s/call)XWA::User::Posts 146.644s ( 2.0%, 0.45 s/call)XWA::User::Picture 141.324s ( 1.9%, 0.10 s/call)XWA::Albums 93.740s ( 1.3%, 2.04 s/call)XWA::Journals 92.390s ( 1.2%, 2.37 s/call)

many improvements since then

➔ YSlow?

➔ The Expires header is your friend!

➔ Hot MyISAM tables converted to InnoDB

➔ MySQL Master/Master setup

➔ Jet Profiler

jet profiler

scalability3

1. avatars

Avatars - 2007

75%/<user-name>/avatar.pl

/<user-name>/avatar.pl?xscale=8192 (!)

my $sql = DBConnect('master');my %user = $sql->get( "SELECT a.blob, a.filename, FROM avatars a, users u WHERE u.user=? AND u.id=a.user", $user);$req->print( $user{'blob'} );

Avatars wtf!?

Avatars - reloaded➔ Export to balanced fs (5 formats)

➔ Zero SQL queries

➔ Storage subsystem

➔ static.myopera.com was born

resources(user uploads, binary blobs, ...)

Poolsor single servers

URLshttp://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_o.pnghttp://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_t.jpghttp://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_m.jpghttp://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_l.jpg

+ x➔ Load

➔ Flexibility

➔ Static scales!

➔ HTTP::DAV

➔ Precomp URLs

2. varnish

VarnishMost popular RSS feeds

My Opera frontpage

Opera Mini approval

Datacenter emergencies

VarnishMost popular RSS feeds

➔ /desktopteam/blog/

➔ Friends, Groups API

➔ No cookies (remove req.http.cookie)

VarnishMy Opera frontpage

➔ Danger, Will Robinson!

➔ Mangle cookies

➔ Accept-Language headers

VarnishOpera Mini 5.0 approval

➔ Global coverage

➔ Traffic surge (5x peak, 2x over 24h)

IT NEEDS TO BE OUTTOMORROW

!!!

THERE WILL BE A

PRESS RELEASE !

VarnishOpera Mini 5.0 approval

➔ Global coverage

➔ Traffic surge (5x peak, 2x over 24h)

➔ No problems!

Opera Mini “countup” trafficSubmittedto Apple StoreMarch, 23rd

ApprovedApril, 12th

VarnishDatacenter emergencies

Datacenter emergencies

files.myopera.com

User Files Storage SAN

DC1

Datacenter emergencies

files.myopera.com

User Files Storage SAN

DC1

DC2

LVS + Varnish servers

~ 1Gbit/s! Varnish

+ x➔ Load

➔ Flexibility

➔ Instant scaling

➔ Chainsaw!

➔ Purging

3. geodns

geodns

+ x➔ Prototype 1 week

➔ Geo-scaling

➔ Redundant

➔ Accuracy

➔ No DC feedback

➔ Monitoring

Next steps➔ Search (Solr?)

➔ Batch activity feed

➔ Real connection pooling

➔ … and on ...

Remember!➔ Team spirit is important

➔ Another level of indirection...

➔ Keep it simple

➔ Keep a log

the heroeshttp://my.opera.com/devblog/about/http://my.opera.com/devblog/

any questions? ?

handout download:

thanks!

http://tinyurl.com/surge2010-cosimo

top related