high scalability by example – how can web-architecture scale like facebook, twitter, youtube, etc

18
Copyright © 2011 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. High Scalability by Example – How can Web Architecture scale like Facebook, Twitter? JAX 2011, 03.05.2011 Robert Mederer, Markus Bukowski 2000 - 2005: Technology Architect and Software Engineer in several projects 2006: Technical Architecture Lead, Integration and Execution Architecture for Location-Based Service Provider 2009: Technical Architecture Lead, Frontend and Execution Architecture Government Agency 2009/2010: Technical Architecture and front-office integration build lead, Integration and Execution Architecture Financial Services Agency 2011: Architect and Performance Engineer for Location Based Services Platform Experience Speaker Copyright © 2010 Accenture. All rights reserved. 2 Robert Mederer Lead Technology Architect Anni-Albers-Straße 11 80807 München Mobil: +49-175-57-68012 [email protected] 2006 - 2009: Software Engineer in several projects (during studies) for mainly telecommunication companies 2009 - 2011: Senior Software Engineer in several projects 2009/2010: Coach and Software Engineer for a Health Insurance Fund 2011: Technical Architecture Lead during the development of a Document Management Solution for a Government Agency Markus Bukowski Senior Software Engineer Kaistraße 20 40221 Düsseldorf Mobil: +49-175-57-64217 [email protected]

Upload: robert-mederer

Post on 15-Jan-2015

6.294 views

Category:

Technology


0 download

DESCRIPTION

Skalierbarkeit bedeutet hohes Aufkommen von Traffic, Daten, Userbase, IO, Parallelverarbeitung und Concurrency, aber wie funktioniert dies bei den bekannten Web 2.0 Plattformen. Wie wird skaliert – horizontal oder vertikal, im Client-Layer, Service-Layer oder im Backend-Layer? Welche Rolle spielt Caching, NoSQL, Clustering und MapReduce bei der Skalierbarkeit? Wie wirkt sich die Skalierbarkeit in Sachen Konsistenz vs. Verfügbarkeit vs. Network Toleranz aus? Der Vortrag geht vergleichend auf verschiedene Konzepte von Skalierbarkeit ein und erläutert anhand von Beispielen wie mit pragmatischen Mitteln eine skalierbare Architektur erreicht werden kann.

TRANSCRIPT

Page 1: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

Copyright © 2011 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture.

High Scalability by Example – How can Web Architecture scale like Facebook, Twitter?

JAX 2011, 03.05.2011

Robert Mederer, Markus Bukowski

2000 - 2005: Technology Architect and Software Engineer in several projects

2006: Technical Architecture Lead, Integration and Execution Architecture for Location-Based Service Provider

2009: Technical Architecture Lead, Frontend and Execution Architecture Government Agency

2009/2010: Technical Architecture and front-office integration build lead, Integration and Execution Architecture Financial Services Agency

2011: Architect and Performance Engineer for Location Based Services Platform

Experience

Speaker

Copyright © 2010 Accenture. All rights reserved. 2

Robert Mederer Lead Technology Architect Anni-Albers-Straße 11 80807 München Mobil: +49-175-57-68012 [email protected]

2006 - 2009: Software Engineer in several projects (during studies) for mainly telecommunication companies

2009 - 2011: Senior Software Engineer in several projects

2009/2010: Coach and Software Engineer for a Health Insurance Fund

2011: Technical Architecture Lead during the development of a Document Management Solution for a Government Agency

Markus Bukowski Senior Software Engineer Kaistraße 20 40221 Düsseldorf Mobil: +49-175-57-64217 [email protected]

Page 2: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

•  Global management consulting, technology services and outsourcing company

•  215.000 employees •  Rank 47 among the

“Best Global Brands 2008” •  Top 100 Employer •  28 of the DAX-30-Companies •  96 of the Fortune-Global-100 •  More than three-quarters of the Fortune-

Global-500 •  87 of our Top 100-clients have been with

us for 10 or more years

Company Profile

Accenture High performance achieved

Worldwide Revenues $21.6 billion (in US$ billion, as of August 31, 2010)

Communications & High Tech

Public Service

Products

Financial Services

Resources

4.83 3.88

4.32

5.53

2.98

Copyright © 2010 Accenture. All rights reserved. 3

•  Austria •  Switzerland •  Germany

Employees •  Ca. 6000 •  We are hiring!

Exciting Technology work •  Large scale projects

(100+ people / multiple years) •  Most challenging requirements

– Stock Exchange / Banking / Trading Systems

– AEMS Mobility Platform – Large Scale Web Applications

(> 1M page views / day) – Batch Architectures

Geographic unit

Copyright © 2010 Accenture. All rights reserved. 4

Local Accenture … ???

Frankfurt

Munich

Düsseldorf Berlin

Vienna

Zurich

Page 3: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

•  Introduction

– Scalability & CAP Theorem

– Traditional websites & Social media websites

– High Scalability in Numbers

• Accenture – High Scalability by Example

• Architecture at Internet Scale

• Common concepts of Scalability

• Conclusion

Copyright © 2011 Accenture. All rights reserved. 5

Agenda

(Patterns for Performance and Operability: Building and Testing Enterprise Software, Chris Ford et. al., 2008)

A system’s capacity to uphold the same performance under heavier volumes.

Scalability

6 Copyright © 2011 Accenture. All rights reserved.

Page 4: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

•  Is achieved by increasing the capacity of a single node – CPU, – Memory, – Bandwidth, …

•  Simple Process – Application is generally not affected by

those changes

•  Classical Example are Super Computers like – HP Integrity Superdome – IBM Mainframe

7

Vertical Scalability

Copyright © 2011 Accenture. All rights reserved.

Source: Hewlett-Packard

•  Application is spread on a cluster with several nodes

•  Nodes can be added to scale out •  Produces overhead

– Keep cluster consistent – Node error detection and handling – Communication between nodes

•  May be used to increase reliability and availability

•  Distributed Systems and Programs like – SETI@Home – World Wide Web – Domain Name Service

8

Horizontal Scalability

Copyright © 2011 Accenture. All rights reserved.

Source: Space Sciences Laboratory, U.C. Berkeley

Page 5: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

Consistency

Availability Partition Tolerance

• Consistency – all clients see the same data at the same time

• Availability – all clients can find all data even in presence of failure

• Partition Tolerance – system works even when one node failed

Copyright © 2011 Accenture. All rights reserved. 9

CAP Theorem (Brewer‘s Theorem)

Impossible Source: PODC-keynote, Towards Robust Distributed Systems, Dr. Eric A. Brewer, 2000

Consistency + Availability •  High data integrity •  Single site, cluster database, LDAP, etc. •  2-phase commit, data replication, etc.

Consistency + Partition •  Distributed database, distributed locking, etc. •  Pessimistic locking, etc.

Availability + Partition •  High scalability •  Distributed cache, DNS, etc. •  Optimistic locking, expiration/leases (timeout), etc.

Normally, two of these properties for any shared-data system

Copyright © 2011 Accenture. All rights reserved. 10

CAP Theorem

Source: “Architecting Cloudy Applications”, David Chou

C

A P

C

A P

C

A P

Page 6: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

•  Introduction

– Scalability & CAP Theorem

– Traditional websites & Social media websites

– High Scalability in Numbers

• Accenture – High Scalability by Example

• Architecture at Internet Scale

• Common concepts of Scalability

• Conclusion

Copyright © 2011 Accenture. All rights reserved. 11

Agenda

Traditional Websites

(Scaling the Social Graph: Infrastructure at Facebook, Jason Sobel, QCon 2011)

12 Copyright © 2011 Accenture. All rights reserved.

Page 7: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

13

Social Media Websites

(Scaling the Social Graph: Infrastructure at Facebook, Jason Sobel, QCon 2011)

Copyright © 2011 Accenture. All rights reserved.

•  Introduction

– Scalability & CAP Theorem

– Traditional websites & Social media websites

– High Scalability in Numbers

• Accenture – High Scalability by Example

• Architecture at Internet Scale

• Common concepts of Scalability

• Conclusion

Copyright © 2011 Accenture. All rights reserved. 14

Agenda

Page 8: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

The World Is Obsessed With Facebook (Video)

Copyright © 2011 Accenture. All rights reserved. 15

High Scalability in Numbers

Source: http://www.youtube.com/watch?v=xJXOavGwAW8

facebook twitter

Active users 500.000.000 New accounts per day 500.000

Active users logging in daily

250.000.000 Tweets per day (a year ago)

140.000.000 (50.000.000)

Active mobile users 250.000.000 Max tweets per second 6.939

In 20 minutes …

Links shared 1.000.000

Status updates 1.851.000

Photos uploaded 2.716.000

Comments made 10.208.000

High Scalability in Numbers

16 Copyright © 2011 Accenture. All rights reserved.

Source: http://www.facebook.com/press/info.php?statistics, http://www.allfacebook.com/

Page 9: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

LinkedIn

Members >100.000.000

Connections between Members 1.300.000.000

Visitors per month 128.000.000

% of global internet users who visit LinkedIn daily 4%

High Scalability in Numbers

17 Copyright © 2011 Accenture. All rights reserved.

Source: http://blog.linkedin.com, http://press.linkedin.com/about/, LinkedIn Demographics 20111

•  Introduction

• Accenture – High Scalability by Example

– The Royal Wedding Website

– US Based Professional Sport League .com Website

• Accenture – High Scalability by Example

• Architecture at Internet Scale

• Common concepts of Scalability

• Conclusion

Copyright © 2011 Accenture. All rights reserved. 18

Agenda

Page 10: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

•  Introduction

• Accenture – High Scalability by Example

• Architecture at Internet Scale

– Facebook

– Twitter

– LinkedIn

– Challenges

• Common concepts of Scalability

• Conclusion

Copyright © 2011 Accenture. All rights reserved. 19

Agenda

•  Biggest Social Network

•  People are linked together •  People share and exchange information in real-time

•  Facebook provides a – A personalized News Feed containing news about friends – Picture Service including people tagging / linking – Messaging Service to stay in connect with friends – Platform to play Games with friends

Facebook – Key Facts

20 Copyright © 2011 Accenture. All rights reserved.

Page 11: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

•  Facebook „main“ architecture is based on a LAMP stack

•  The application logic resides in Web Server layer (PHP based)

•  The application layer takes care of the data distribution

•  Some services do not fit (C++, Java) – Search – Ads – People You May Know – Multifeed

Facebook – Architecture (1)

Invalidation logic is implemented in the application!

MySQL

Memcache

21 Copyright © 2011 Accenture. All rights reserved.

Multifeed / Newsfeed

• Distributed System

• Every user gets – 45 best rated updates –  on every reload

• Every DB-Update results in a Scribe Notification –  Leaf Nodes / Server are

notified

22

Facebook – Architecture (2)

Copyright © 2011 Accenture. All rights reserved.

Source: “High Performance at Massive Scale – Lessons learned at Facebook”, Jeff Rothschild, 2009 Inspired By: “Architecting Cloudy Applications”, David Chou, 2010

Page 12: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

•  Introduction

• Accenture – High Scalability by Example

• Architecture at Internet Scale

– Facebook

– Twitter

– LinkedIn

– Challenges

• Common concepts of Scalability

• Conclusion

Copyright © 2011 Accenture. All rights reserved. 23

Agenda

•  Twitter is a real-time information network

•  Twitter‘s key characteristics – Tweets are used for sharing information – Timeline of tweets must be kept – A Social Graph is maintained to deliver

information

Twitter – Key Facts

24 Copyright © 2011 Accenture. All rights reserved.

Page 13: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

Load Balancers

Apache mod_proxy

Rails (Unicorn)

Flock memcached Kestrel

MySQL Cassandra

Daemons

Twitter - Architecture

Source: “Architecting Cloudy Applications”, David Chou

25 Copyright © 2011 Accenture. All rights reserved.

Web Frontend

Apps & Services

Partitioned Data

Distributed Cache

Distributed Storage

Queues

•  Introduction

• Accenture – High Scalability by Example

• Architecture at Internet Scale

– Facebook

– Twitter

– LinkedIn

– Challenges

• Common concepts of Scalability

• Conclusion

Copyright © 2011 Accenture. All rights reserved. 26

Agenda

Page 14: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

•  A social media networking platform to find connections to recommended job candidates, industry experts and business partners

•  99% pure Java

LinkedIn – Applications •  Groups •  Job market •  Mailing •  Profiles (Friends and Companies) •  Mobile Applications (Android, iPhone,

Blackberry, Palm)

LinkedIn – Key Facts

27 Copyright © 2011 Accenture. All rights reserved.

• Keep core data ACID • Keep replicated and cached data BASE • Cache data on a cheap memory (memcached) • Use a hint to route the client to his / her’s ACID data

LinkedIn - Architecture

28 Copyright © 2011 Accenture. All rights reserved.

Source: Project Voldemort, Jay Kreps Source: http://sna-projects.com

Apps & Services

Web Frontend

Storage

Spring MVC Grails DWR

LinkedIn Spring

MySQL Hadoop

Caching ehcache

Voldemort

Voldemort

Service Service Service Service

Memcache

Queuing / Messaging

ActiveMQ

Lucene

Search Index Zoie Bobo

Spring MVC

•  Model-View-Controller Web Framework

•  Used for

•  Initial Load of Page

•  Pull Operations

•  …

Grails

•  Open-source web application

•  Aims to bring the "coding by convention" paradigm to Groovy

•  Why Grails? •  Needed a more productive web

framework

•  Rapid prototyping to demonstrate feasibility and productivity gain

•  Integration with existing java/spring-based solutions

DWR •  Direct Web Remoting (AJAX for

Java Library) •  RPC library to call Java functions

from JavaScript and to call JavaScript functions from Java

•  Used For •  Asynchronous Communication

•  Push Operations

LinkedIn Search

•  Lucene Search Engine •  Apache open source high-

performance, full-featured text search engine library written entirely in Java

•  Zoie •  realtime indexing/search

system

•  Bobo •  faceted search engine for the

Social Graph

•  handles semi-structured and structured data

Sensei

Caching

•  Voldemort •  Voldemort combines in

memory caching with the storage system so that a separate caching tier is not required

•  ehcache •  Ehcache is an open source,

standards-based cache used to boost performance, offload the database and simplify scalability

•  Memcache •  Distributed Cache

Queuing / Messaging

•  Asynchronous processing for

•  user triggered writes

LinkedIn Spring •  Spring Framework (IoC, DI, AOP,

Separation of Concerns,..) •  LinkedIn extensions

•  XML Vocabulary (lin:bean, lin:component, …)

•  Property Editors

•  New kind of application context for life-cycle management with components

•  …

Storage Sensei •  distributed database

•  horizontally scaled •  relaxed Consistency with high

durability guarantees

Storage Hadoop

•  Hadoop is a software framework that supports data-intensive distributed applications (map/reduce / distributed file system)

•  Hadoop is used as a major distributed key-value data store for the social graph

•  Bottleneck:

•  Read data

•  Index building

Storage Valdemort •  Valdemort used as a major

distributed key-value data cache for the social graph

•  Advantage:

•  Fast concurrent read

•  Fast index building

Page 15: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

•  Introduction

• Accenture – High Scalability by Example

• Architecture at Internet Scale

• Common concepts of Scalability

• Conclusion

Copyright © 2011 Accenture. All rights reserved. 29

Agenda

Common concepts of Scalability

30 Copyright © 2011 Accenture. All rights reserved.

Source: “Architecting Cloudy Applications”, David Chou Source: highscalability.com

idempotent operations

shared nothing

parallelization

7 Habits of Good

Distributed Systems

asynchronous

partitioned data fault-tolerance

optimistic concurrency

Page 16: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

• Scale-out (horizontal) – BASE – focus on “commit” – optimistic locking – shared nothing – maximize scalability

31

Common concepts of Scalability Hybrid architectures

• Scale-up (vertical) – ACID – availability first – pessimistic locking – transactional – favor accuracy/consistency

Most distributed systems realize both

Copyright © 2011 Accenture. All rights reserved.

•  Introduction

• Accenture – High Scalability by Example

• Architecture at Internet Scale

• Common concepts of Scalability

• Conclusion

–  Questionnaire

–  Déjà vu - Common concepts

–  Don’t forget the Infrastructure

Copyright © 2011 Accenture. All rights reserved. 32

Agenda

Page 17: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

•  Is there a need to scale my application? –  Vertical scaling is more easy to achieve –  Use horizontal scaling only when required

•  Is there a plan to proof your designed solution? –  Plan to do a lot of realistic Proof-of-Concepts

•  Is there a one size fits all solution? –  NO!

• How important is ACID? – Is BASE enough? – Can a NoSQL solution be used?

Copyright © 2011 Accenture. All rights reserved. 33

Conclusion Questionnaire

• Asynchronous Processing – Keep Facebook, Twitter and LinkedIn in mind –  Asynchronous writes and even UI

• Data Partioining –  You must understand your data –  Is a hybrid solution as used by LinkedIn applicable?

• Design To Failure –  Lessons learned from Amazon‘s cloud solution outage

Copyright © 2011 Accenture. All rights reserved. 34

Conclusion Déjà vu - Common concepts

Page 18: High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc

• Disk IO compared to RAM is slow -  Avoid „slow“ IO access whenever possible -  Caching, Caching, Caching -  Bulk Updates (Batch) over small updates

• Network Latency -  Keep in mind that there is a network -  Where are the users located? -  How are your users connected? mobile application? -  Is there a need for a datacenter distribution?

• Hardware configuration –  Is the CPU Power Saving working right?

Copyright © 2011 Accenture. All rights reserved. 35

Conclusion Don’t forget the Infrastructure

Copyright © 2011 Accenture. All rights reserved. 36

Questions ?