digg.com software architecture
DESCRIPTION
Digg is one of the largest sites on the internet serving some 26 million unique visitors a month. Those unique visitors are only half the story as hundreds of millions of requests from dozens of sources actually hit Digg's stack monthly.Join Joe Stump, Lead Architect, from Digg as he pulls back the curtain for a peak at the systems and software architecture that makes Digg hum along.Watch a video at http://www.bestechvideos.com/2009/03/16/digg-an-infrastructure-in-transitionTRANSCRIPT
![Page 1: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/1.jpg)
An Infrastructure in Transition
Joe Stump, Lead Architect, Digg
![Page 2: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/2.jpg)
Introductions
![Page 3: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/3.jpg)
✓ 35,000,000 uniques✓ 3,500,000 users✓ 15,000 requests / sec✓Hundreds of servers
![Page 4: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/4.jpg)
“Web 2.0 sucks (for scaling).”Joe Stump
![Page 5: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/5.jpg)
What’s Scaling?
![Page 6: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/6.jpg)
What’s Scaling?
Specialization
![Page 7: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/7.jpg)
What’s Scaling?
Severe Hair Loss
![Page 8: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/8.jpg)
What’s Performance?
![Page 9: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/9.jpg)
What’s Performance?
Who cares?
![Page 10: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/10.jpg)
4 Stages of Scaling
![Page 11: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/11.jpg)
![Page 12: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/12.jpg)
![Page 13: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/13.jpg)
![Page 14: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/14.jpg)
![Page 15: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/15.jpg)
As it stands ...
![Page 16: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/16.jpg)
Applications
Netscalers
MogileFS
Rec. Engine
ZOMG ROFLAFK WTFLULZ
Lucene
![Page 17: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/17.jpg)
Building Blocks• MogileFS
- 9 nodes- 2.8TB of files
• Gearman- Each application
server- 400,000 jobs / day
• Memcached- 25 nodes- 2GB / node
![Page 18: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/18.jpg)
Moving forward ...
![Page 19: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/19.jpg)
MogileFS
Rec. EngineLucene
Netscalers
Applications
Services
IDDB
Netscalers
Applications
Services
IDDB
Messaging
![Page 20: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/20.jpg)
✓ Elastic horizontal partitions✓Heterogenous partition types✓Muti-homed✓ ID’s live in multiple places✓Partitioned result sets
IDDB
![Page 21: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/21.jpg)
IDDB_ID_Intid bigintdate_created timestampstatus tinyintversion bigint
IDDB_ID_Charcharid charname charvalue charintid bigintdate_created timestamp
IDDB_ID_Int_Shardsintid bigintshardid intstatus tinyint
IDDB_Shardsid biginttype charhost charport mediumintuser charpass charstatus tinyint
![Page 22: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/22.jpg)
✓Memcached + BDB✓ 28,000+ writes a second✓Persistent key/value storage✓Works with Memcached
clients
MemcacheDB
![Page 23: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/23.jpg)
War stories ...
![Page 24: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/24.jpg)
✓ 15,000 - 17,000 submissions per day
✓Crawl for images, video embeds, source, other meta data
✓Ran in parallel via Gearman
Digg Images
![Page 25: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/25.jpg)
✓ 230,000+ Diggs per day✓Most active Diggers are also
most followed✓ 3,000 writes per second✓Ran in background via
Gearman✓ Eventually consistent
Green Badges
![Page 26: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/26.jpg)
user_ip_views
![Page 27: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/27.jpg)
✓Switched to explicit caching✓ Intelligently grouped objects
in cache ✓Sorting, limiting, etc. done in
the application layer✓ 200% to 300% gains in
performance
Digg Comments
![Page 28: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/28.jpg)
✓ Vertical partitioning✓Migrate in background
processes ✓Use the bots✓Keep track of migration✓Retry failed migrations
automatically
Data Migration
![Page 29: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/29.jpg)
Things to ponder ...
![Page 30: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/30.jpg)
CAP Theorem
![Page 31: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/31.jpg)
Have I ran the numbers?
![Page 32: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/32.jpg)
Is MySQL the best solution?
![Page 33: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/33.jpg)
Can I do this later?
![Page 34: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/34.jpg)
How can I partition this data?
![Page 35: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/35.jpg)
How should I cache this data?
![Page 36: Digg.com Software Architecture](https://reader034.vdocuments.mx/reader034/viewer/2022052522/54675217b4af9f3f3f8b570c/html5/thumbnails/36.jpg)
Questions?!