things you should know about scalability!

51
Copyright © 2011 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Things you should know about Scalability! WJAX 2011, 08.11.2011, Munich Robert Mederer

Upload: robert-mederer

Post on 17-Jan-2015

1.807 views

Category:

Technology


0 download

DESCRIPTION

Delivering architecture@internet-scale has several challenges to be solved to be ready for extreme scalable architectures. This session is about the art of scale, scalability, and scaling of web architectures. It will give an overview of challenges, good practices and solutions to achieve high scalability for web-based systems.

TRANSCRIPT

Page 1: Things you should know about Scalability!

Copyright © 2011 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture.

Things you should know about Scalability! WJAX 2011, 08.11.2011, Munich

Robert Mederer

Page 2: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 2

Delivering architecture@internet-scale has several challenges to be solved to be ready for extreme scalable architectures. This session is about the art of scale, scalability, and scaling of web architectures. It will give an overview of challenges, good practices and solutions to achieve high scalability for web-based systems.

Things you should know about Scalability!

Abstract

Page 3: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 3

Who am I?

Robert Mederer Lead Architecture & Execution

Anni-Albers-Straße 11 80807 München Mobil: +49-175-57-68012 [email protected]

2000 - 2005: Technology Architect and Software Engineer in several projects

2006: Technical Architecture Lead, Integration and Execution Architecture for Location-Based Service Provider

2009: Technical Architecture Lead, Frontend and Execution Architecture for a Government Agency

2009/2010: Technical Architecture and front-office integration build lead, Integration and Execution Architecture Financial Services Agency

2011: Architect and QA for Location Based Services Platform

Experience

Page 4: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 4

•  Global management consulting, technology services and outsourcing company

•  236.000 employees

•  Rank 47 among the “Best Global Brands 2008”

•  Top 100 Employer

•  28 of the DAX-30-Companies

•  96 of the Fortune-Global-100

•  More than three-quarters of the Fortune-Global-500

•  87 of our Top 100-clients have been with us for 10 or more years

Company Profile

Accenture High performance achieved

Worldwide Revenues $25.5 billion (in US$ billion, as of August 31, 2011)

Communications & High Tech

Public Service

Products

Financial Services

Resources

Page 5: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 5

•  Austria •  Switzerland •  Germany

Employees •  >6000 •  We are hiring!

Exciting Technology work •  Large scale projects

(100+ people / multiple years) •  Most challenging requirements

– Stock Exchange / Banking / Trading Systems

– AEMS Mobility Platform – Large Scale Web Applications

(> 1M page views / day) – Batch Architectures

Geographic unit

Local Accenture … ???

Frankfurt

Munich

Düsseldorf Berlin

Vienna

Zurich

Erlangen/ Nürnberg

Page 6: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 6

•  Introduction

•  Case Study

•  Solution and Good Practice

•  Further Topics

•  Conclusion

Agenda

Page 7: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 7

•  Introduction

•  Case Study

•  Solution and Good Practice

•  Further Topics

•  Conclusion

Agenda

Page 8: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 8

High Scalability / Overload

Introduction

Source: Ezprezzo

Page 9: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 9

Who are You?

– Developers,

– Architects,

– IT Manager

How large are your application (QPS)?

– 10-100?

– 100-1000?

– 1000-10000?

– 10000+?

How large is your total database?

– < 10 GB?

– 10 GB-100 GB?

– 100GB-1TB?

– 1TB-10 TB?

– 10TB+?

Audience?

Introduction | Question

Page 10: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 10

If your system is

slow for a single user

How do I know if I have a performance problem?

Introduction | What is Performance?

Page 11: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 11

If your system is

fast for a individual user but slow under high load

How do I know if I have a scalability problem?

Introduction | What is Scalability?

Page 12: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 12

•  Performance testing is defined as the technical investigation to determine or validate the speed, scalability, and/or stability characteristics of the web based system under test.

•  Performance-related activities, such as testing and tuning, are concerned with achieving response times, throughput, and resource-utilization levels that meet the performance objectives for the application (SLA) under test.

Key Types of Performance Testing:

Non-Functional Testing Performance Testing of Web based systems

Performance Testing

Load Testing Stress Testing Capacity Testing

“Will it be fast enough?“

"Will it support all of my clients?“

"What happens if something goes wrong?"

"What do I need to plan for when I get more customers?“

Introduction | What is Performance?

Source: Thomas Werft, Performance Engineer at Accenture

Definition

Page 13: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 13

Key performance indicators:

Non-Functional Testing Performance Testing of Web based systems

Criteria KPI Description

Response Time (first / last byte in ms)

Average An average is a value found by adding all of the numbers in a set together and then dividing them by the quantity of numbers in the set

Percentile (Target 98%)

A percentile is a measure that tells us what percent of the total frequency scored at or below that measure.

Median A median is simply the middle value in a data set when sequenced from lowest to highest.

Throughput (QPS) Requests per Second; Transaction per Second

Throughput is the number of units of work that can be handled per unit t of time; for instance, requests per second, calls per day, hits per second, reports per year, etc.

Resource Utilization Processor; Memory; Disk I/O; Network I/O

Resource utilization is the cost of the project in terms of system resources. Utilization is the percentage of time that a resource is busy servicing user requests. The remaining percentage of time is considered idle time.

Introduction | What is Performance?

Source: Thomas Werft, Performance Engineer at Accenture

Results are used for Performance Engineering, Performance Tuning

Page 14: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 14

A system’s capacity to uphold the same performance under heavier volumes.

Scalability

Introduction | What is Scalability?

Definition

Source: Patterns for Performance and Operability: Building and Testing Enterprise Software, Chris Ford et. al., 2008

Page 15: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 15

•  CPU,

• Memory,

•  Bandwidth, …

Simple Process •  Application is generally not affected by

those changes

Classical Example are Super Computers like •  HP Integrity Superdome

•  IBM Mainframe

Is achieved by increasing the capacity of a single node

Vertical Scalability

Introduction | What is Scalability?

Source: Hewlett-Packard

Page 16: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 16

•  Nodes can be added to scale out Produces overhead -  Keep cluster consistent

-  Node error detection and handling

-  Communication between nodes

• May be used to increase reliability and availability

•  Distributed Systems and Programs like – SETI@Home

– World Wide Web

– Domain Name Service

•  Application is spread on a cluster with several nodes

Horizontal Scalability

Source: Space Sciences Laboratory, U.C. Berkeley

Introduction | What is Scalability?

Page 17: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 17

Consistency

Availability Partition Tolerance

• Consistency – all clients see the same data at the same time

• Availability – all clients can find all data even in presence of failure

• Partition Tolerance – system works even when one node failed

CAP Theorem (Brewer‘s Theorem)

Impossible

Source: PODC-keynote, Towards Robust Distributed Systems, Dr. Eric A. Brewer, 2000

Introduction | Scalability Trade-Offs | Availability vs. Consistency

Page 18: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 18

Consistency + Availability •  High data integrity

•  Single site, cluster database, LDAP, etc.

•  2-phase commit, data replication, etc.

Consistency + Partition •  Distributed database, distributed locking, etc.

•  Pessimistic locking, etc.

Availability + Partition •  High scalability

•  Distributed cache, DNS, etc.

•  Optimistic locking, expiration/leases (timeout), etc.

Normally, two of these properties for any shared-data system

CAP Theorem

C

A P

C

A P

C

A P

Introduction | Scalability Trade-Offs | Availability vs. Consistency

Source: “Architecting Cloudy Applications”, David Chou

Page 19: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 19

Data and Scalability

Source: Visual Guide to NoSQL Systems, http://blog.nahurst.com/tag/cap

Data Models Key:

Relational (comparison) Key-Value Column-Oriented Document-Oriented

•  RDBMSs (MySQL, Postgres, etc.)

•  Greenplum •  Vertica

Consistent & Available •  Cassandra •  SimpleDB •  CouchDB •  Riak •  Dynamo •  Voldemort •  Tokyo

Cabinet •  KAI

Available & Partition Tolerant Distributed Non-Relational data store solutions must relax guarantees around consistency, partition tolerance and availability, resulting in systems optimized for different combinations of properties.

•  BigTable •  HyperTable •  Hbase •  MongoDB •  Terrastore

•  Scalaris •  BerkeleyDB •  MemcacheDB •  Redis

Consistent & Partition Tolerant

Consistency

Availability Partition Tolerance

Introduction | Scalability Trade-Offs | Availability vs. Consistency

Page 20: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 20

Analysis and Classification

Data and Scalability

Introduction | Scalability Trade-Offs | Availability vs. Consistency

Page 21: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 21

ACID - Do I really need it?

Data and Scalability

Introduction | Scalability Trade-Offs | Availability vs. Consistency

In order to guarantee transactional integrity, the traditional relational database management system (RDBMS) was architected to guarantee four core properties: Atomicity, Consistency, Isolation and Durability (ACID).

Atomicity A database is said to be atomic if when one

part of the transaction fails, the entire transaction fails and database state is left

unchanged.

Consistency if the database remains in a consistent state

after any transaction. Therefore, if a transaction violates the consistency of the

database (e.g. the value is not the right type) then the transaction should be rolled back.

Durability A database is said to be durable if it recovers

all of the committed transactions in the system even after system failure.

Isolation A database is said to be isolated if transactions

can’t have access to data currently being modified by another transaction.

Relational databases were originally designed for transactional data processing – reliably processing and maintaining data integrity – on different HW architectures.

Page 22: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 22

• Basically Available • Soft-state (or scalable)

• Eventually consistent

Modern Internet systems: focused on BASE

BASE

Introduction | Scalability Trade-Offs | Availability vs. Consistency

Example: Amazon outage in April 2010 brought thousand of customers down, including Pfizer, Netflix, Quora, Foursquare, Reddit, … • The Amazon.com 2010 Shareholder Letter Focusses on Technology

• http://www.allthingsdistributed.com/2011/04/the_amazoncom_2010_shareholder.html • http://broadcast.oreilly.com/2011/04/the-aws-outage-the-clouds-shining-moment.html • http://www.nytimes.com/2011/04/23/technology/23cloud.html

• http://www.allthingsdistributed.com/2007/12/eventually_consistent.html Dec. 2007

Page 23: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 23

ACID BASE

•  Strong consistency for transactions highest priority

•  Availability less important

•  Pessimistic

•  Complex mechanisms

•  Availability and scaling highest priorities

• Weak consistency

• Optimistic

•  Simple and fast

ACID vs. BASE

Introduction | Scalability Trade-Offs | Availability vs. Consistency

Page 24: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 24

Network protocols has an inherent throughput bottleneck that becomes more severe with increased packet loss and latency

Network Latency vs. Throughput

Introduction | Scalability Trade-Offs - Latency vs. Throughput

Source: http://www.asperasoft.com/en/technology/shortcomings_of_TCP_2/the_shortcomings_of_TCP_file_transfer_2

Page 25: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 25

• Processing load is distributed

• Closer to the user

• Decreases latency

• Lower cost of hardware

•  Increases service levels

• Greater flexibility in responding to service requests

• Seasonal spikes in demand can be off-loaded to other edge servers

Transferring data or services from a centralized point to the edge of the network

Edge Computing

Introduction | Scalability and Edge Computing

Page 26: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 26

•  Store objects for the application to be reused

•  Cache data from database or generated by application

•  E.g. ehCache, memcached, etc.

Application Cache •  Speed up performance or minimize resources used

•  Proxy caching / Reverse proxy caching

•  E.g. Squid, Varnish, etc

Content Delivery Network (CDN) •  Faster response time and fewer requests on the origin servers

•  Push content closer to end user

•  E.g. Akamai, Savvis, Mirror Image Internet, Netscaler, Amazon CloudFoundry, etc

Object cache

Caching and Types of Caches

Introduction | Caching

Page 27: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 27

Abstract architecture of a Content Delivery Network (CDN)

CDN

Source:Content Delivery Network (CDN) Research Directory, http://ww2.cs.mu.oz.au/~apathan/CDNs.html

Introduction | Caching

Page 28: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 28

Basic interaction flows in a CDN environment

CDN

Introduction | Caching

Source: Basic interaction flows in a CDN environment, http://ww2.cs.mu.oz.au/~apathan/CDNs.html

Page 29: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 29

• Methodology to distribute workload across multiple computers or a computer cluster, network links, central processing units, disk drives, or other resources, to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid overload

• Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy. The load balancing service is usually provided by dedicated software or hardware, such as a multilayer switch or a Domain Name System server.

Definition:

Basics Load Balancing

Introduction Introduction

Page 30: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 30

Load Balancing (Major) Usage

• Distributing the load across multiple servers •  Target is to scale beyond the capacity of one server, and to tolerate a

server failure. Server LB

• Directing users to different data center sites consisting of server farms •  Target is to provide users with fast response time and to tolerate a

complete data center failure (availability, business continuity, disaster recovery, geographic routing)

Global Server LB

• Distribute the load across multiple firewalls •  Target is to scale beyond the capacity of one firewall, and tolerate a

firewall failure. Firewall LB

•  Transparently directs traffic to caches to accelerate the response time for clients

• Or improve the performance of web servers by offloading the static content to caches.

Transparent Cache Switching

Source: Load Balancing Servers, Firewalls, and Caches by Chandra Kopparapu; John Wiley & Sons © 2002

Introduction

Page 31: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 31

•  Pros: Simple to implement.

•  Cons: Can lead to overloading of one server while under-utilization of others.

Round-Robin Allocation •  Pros: Better than random allocation because the requests are equally

divided among the available servers in an orderly fashion.

•  Cons: Round robin algorithm is not enough for load balancing based on processing overhead required and if the server specifications are not identical to each other in the server group.

Weighted Round-Robin Allocation •  Pros: Takes care of the capacity of the servers in the group.

•  Cons: Does not consider the advanced load balancing requirements such as processing times for each individual request.

Random Allocation

Basics Load Balancing Algorithm’s

Introduction

Page 32: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 32

•  Hardware

– Barracuda Networks

– Cisco Systems

– Citrix Systems

– F5 Networks (BigIp)

– Etc.

•  Software

– HAProxy

– Apache HTTP Server with mod_proxy for Tomcat

– …

Basics Server Load Balancing

Introduction

Problem: • No real load balancing due to TTL of DNS • No health check for service availability

Simple Load Balancing over DNS (List of IP‘s with round robin)

Does that work?

Page 33: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 33

•  Functionality

– DNS based routing

– Based on IP GEO database (Geographic routing)

– Assumption: Local DNS for client

•  Provider

– F5 Networks (Global Load Balancing Solutions)

– UltraDNS (Traffic Controller Service)

– Level3 (Traffic Manager, BCDR Solution)

Global Server Load Balancing

Introduction | Load Balancing

Page 34: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 34

•  Increase application availability in event of entire site failure or overload (Business Continuity, Disaster Recovery)

•  Scale application performance by load balancing traffic across multiple sites (Edge Computing (together with CDN))

•  Need for more granularity and control in directing Web traffic

• More flexibility in building and managing Internet infrastructures

–  E.g. Site based downtime management during release upgrade

•  Cons: Not always working! Due to assumption of a local DNS (Public DNS usage, DNS over VPN could fail to get the nearest server location) –  (see: http://www.royans.net/arch/fixing-gslb-global-server-load-balancing/)

•  Fix: Google proposed a DNS enhancement to not use the DNS resolver IP further more the client / end-user IP (see: DNS resolver, http://googlecode.blogspot.com/2010/01/proposal-to-extend-dns-protocol.html )

Characteristics / Usage

Global Server Load Balancing

Introduction | Load Balancing

Page 35: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 35

•  Introduction

•  Case Study

•  Solution and Good Practice

•  Further Topics

•  Conclusion

Agenda

Page 36: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 36

Case Study – Non-Functional Requirements

User groups: •  Web Browser users •  Mobile users

USA: 50 Mil. EU: 30 Mil.

ASIA: 15 Mil.

AU: 2 Mil.

Case Study – Internet Scale Web Services

Availability: 99,99 %

Page 37: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 37

AU: 1 data center: •  Sydney Peak: 3.000 QPS

USA: 2 data center: •  New York •  San Francisco Peak: 20.000 QPS

EU: 2 data center: •  Frankfurt •  London Peak: 10.000 QPS

ASIA: 1 data center: •  Singapore Peak: 5.000 QPS

Case Study – Non-Functional Requirements

Case Study – Internet Scale Web Services

Page 38: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 38

AU / ASIA: Failover Singapore ↔ Sydney: 8.000 QPS

Case Study – Non-Functional Requirements Performance in Case of Failure

USA: Failover New York ↔ San Francisco: 40.000 QPS

EU: Failover Frankfurt ↔ London: 20.000 QPS

Case Study – Internet Scale Web Services

Page 39: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 39

Case Study – Non-Functional Requirements Response Times

RESTful Web Services: • Calculate service: 100 ms (50ms latency) •  Binary service: 60 ms (50 ms latency) •  Search service: 50 ms (50 ms latency)

Case Study – Internet Scale Web Services

Page 40: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 40

Case Study – Non-Functional Requirements Data

100 TByte on each geography -  Binary (video, image, …) -  Index data

Case Study – Internet Scale Web Services

Page 41: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 41

•  Introduction

•  Case Study

•  Solution and Good Practice

•  Further Topics

•  Conclusion

Agenda

Page 42: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 42

Case Study – Solution

Page 43: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 43

Case Study – Solution

Page 44: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 44

•  Introduction

•  Case Study

•  Solution and Good Practice

•  Further Topics

•  Conclusion

Agenda

Page 45: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 45

• Organization

– People, Process and Tools

– Governance (Lifecycle management)

• Where I do I find the truth in a highly scaled and distributed architecture?

– Logging

• Log Analytics (e.g. Scribe (not really), Splunk)

– End-to-end data visualization

Furhter Topics

Page 46: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 46

•  Introduction

•  Case Study

•  Solution and Good Practice

•  Further Topics

•  Conclusion

Agenda

Page 47: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 47

Reverse proxy Caching

•  Fast and Scales well

•  Dealing with invalidation is tricky

•  Direct cache invalidation scales badly

•  Instead, change URLs of modified resources

• Old ones will drop out of cache naturally

CDN – Content Delivery Network

•  Faster response time and fewer requests on the origin servers

•  No 100% control of caching. Based on internal statistics (Akamai).

• Operated by 3rd parties. Already in place. Not for Free

• Once something is cached on CDN, assume that it never changes

•  Sometimes does load balancing as well

Content Caching

Conclusion

Page 48: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 48

Common Concepts of Scalable Architecture

idempotent operations

shared nothing loosely coupled

parallelization

7 Habits of Good

Distributed Systems

asynchronous

partitioned data

fault-tolerance

optimistic concurrency

Source: "Architecting Cloudy Applications", David Chou Source: highscalability.com

Conclusion

Page 49: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 49

•  Is there a need to scale my application?

–  Vertical scaling is more easy to achieve (Cost)

–  Use horizontal scaling only when required (Complexity)

•  Is there a plan to proof your designed solution?

–  Plan to do a lot of realistic Proof-of-Concepts

•  Is there a one size fits all solution?

–  NO!

•  How important is ACID?

– Is BASE enough?

– Can a NoSQL solution be used?

Questionnaire

Conclusion

Page 50: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 50

References

The Art of Scalability: Scalable Web Architecture, Processes and Organizations for the Modern Enterprise; Michael T. Fisher, Martin L. Abbott; Addison-Wesley Professional; 1 edition

Scalability Rules: 50 Principles for Scaling Web Sites; Martin L. Abbott, Michael T. Fisher Addison-Wesley Professional; 1 edition (May 15, 2011)

Scalable Internet Architectures; Theo Schlossnagle; Sams; 1 edition (July 31, 2006)

Building Scalable Web Sites; Henderson; Oreilly

Websites: HighScalability.com, infoQ.com, Qcon.com, …

Page 51: Things you should know about Scalability!

Copyright © 2011 Accenture. All rights reserved. 51

Contribution and Review:

Bukowski, Markus; Conradt, Steffen; Jacobs, Mareike; Krogemann, Markus; Peuker, Jan; Van Isacker, Pieter; Wagenknecht, Dominik; Wagner, Hubert; Werft, Thomas; Zakotnik, Jure

Thank You!