fowa scaling the lamp stack workshop

Download FOWA Scaling The Lamp Stack Workshop

Post on 13-May-2015




7 download

Embed Size (px)


Slides from the workshop "Scaling the LAMP Stack" at the Future of Web Apps on October 5, 2007


  • 1.Scaling the LAMP Stack Future of Web Apps October 5, 2007

2. Introductions 3. Specific Problems, Challenges and Issues 4. About this workshop

  • This is a broad topic
  • Theory and application
  • Real-world focus
  • Interactive (please!)

5. About web apps and scaling

  • Some different ways of looking at the problem

6. Things to think about

  • Multi-server: locking and concurrency
  • Running many: keep in mind whats expensive, sloppy or risky
  • Code quality
  • The law of truly large numbers

7. Elements of Scaling

  • Split up different tasks
  • Use more hardware (intelligently)
  • Partition
  • Replicate
  • Cache
  • Optimize (code and hardware)
  • Identify and fix weaknesses
  • Manage

8. Tools and Components

  • Apache + PHP
  • MySQL
  • File System (local)
  • Networked File System
  • Load Balancers
  • memcached

9. Contemplating Scaling

  • Understand what your app does (and how much)
  • Identify the bottlenecks
  • Solve near-term problems
  • Design well, but dont over-design

10. Web apps do lots of things

  • Different operations have different scaling issues.

11. What does your app do?

  • List the high level elements of what your application does.Separate out different functions that will have different scaling issues.

12. Common things that web apps do

  • Manage connections/protocols
  • Deliver static content
  • Manage sessions
  • Manage user data
  • Render dynamic pages
  • Access external APIs
  • Process media

13. Update the list of things your app does

  • Add anything you missed
  • Note which items you do in quantity

14. Easy vs. Difficult Scaling

  • What happens when you add hardware?
  • Does it work?
  • Does more hardware = more performance?

15. Things that break when you scale

  • State that isnt properly shared (especially sessions)
  • Updates/refreshes (caching and replication issues)

16. Things that dont improve when you add more servers

  • Unpartitioned databases
  • Anything that locks/blocks
  • Inefficient code, especially big queries

17. Scaling Each Element

  • (do easy separations first)

18. Managing Connections/Protocols

  • No problem putting on multiple servers
  • Apache is good
    • Not too far away out of the box
    • Moderately tunable
  • Linux tuning
    • TCP stack (tune to handle unusual networking needs)

19. Key Apache Configuration Issues

  • MaxClients(and ServerLimit, ThreadLimit and ThreadsPerChild)
  • Avoid using PHP (or other) handler unnecessarily
  • Use the worker MPM
  • Maybe MaxRequestsPerChild

20. Delivering Static Content

  • Dont process it unnecessarily
    • Either cache or use no Apache handlers
    • Caching can let you treat semi-static content as static
  • Multiple servers complicates updates, but is otherwise easy

21. General Discussion: Multi-server, state and sessions

  • Rethinking state for multi-server environments
    • What is state?
    • Short-term state (sessions)
    • Long-term state (application data)
    • Managing state is usually the hardest part of scaling

22. What happens with state

  • Written(created/destroyed/changed)
  • Read
  • Stored

23. Requirements for managing state

  • Depend on what it is and how it is used
  • Perfect coherence
  • Performance of different operations

24. Ways of scaling state

  • Replication: make more copies
  • Partitioning: split up the work
  • Caching
  • Should make different choices for different state/data elements

25. About Load Balancers

  • What load balancers do
    • Spread load
    • Detect server failures
    • Stickiness/persistence
    • Acceleration (especially SSL)
  • Fancy features (including good stickiness) are expensive

26. Why sticky sessions are not usually good in practice

  • Servers fail
  • Corner cases exist

27. Managing Sessions 28. Where session data can be stored

  • Browser cookies
  • Web server temporary files (not scalable)
  • App server state
  • Database
  • Cache

29. PHP session management

  • Default (files) method isnotmulti-server friendly, and thus not scalable (unless sticky)
  • Can implement a different back-end easily

30. Designing a session back-end

  • Requirements
  • Data storage options
    • Cookies only (re-auth, let the browser take care of the logout but less secure)
    • Full-featured involves a combination of cookies and database and cache
    • (discussion of session details)

31. Managing small user data

  • Databases are more efficient, flexible and sharable than small files
  • Frequently-read data should be cached

32. Managing large user data

  • NFS has flaws but is almost inevitable
  • Locking is usually not important, but can be
  • Performance degradation can be sudden

33. About NFS

  • NFS is usually transparent to your app
  • NFS is easy to implement gives you multiple-write access
  • NFS locking is not to be trusted
  • The Linux NFS client is slow for writes and can do bad things under stress

34. User data and locking

  • Names based on hashes often mean no locking is needed
  • Databases do locking better than file systems do
  • Locking requires housekeeping

35. Disk Storage Hardware

  • Disk performance can degrade suddenly
  • If the ratio of access to storage is low, then even slow disk is usually fine
  • Think about seek times and spindles

36. Rendering dynamic pages

  • Depends heavily on application specifics(query, search, process, etc.)
  • Watch out for:
    • Onerous queries (create and watch slow query log)
    • Locking of resources and/or incoherence if state changes
    • Heavy CPU and memory usage
  • Cache both elements and complete pages

37. Processing media

  • CPU intensive
  • May be memory intensive
  • Might be spiky
  • Might need its own server pool

38. Hardware

  • Start simple
  • Observe performance and respond accordingly
  • Get lots of memory

39. Hardware-driven behaviors

  • Sudden degradation because demand exceeds supply (usually relieved unhappily)
  • Get behind due to a spike, and recover
  • Not enough resources for normal optimization

40. Specific hardware issues

  • Not enough memory
    • Severe: paging/swapping
    • Mild: poor automatic caching; slowness due to fragmentation
  • Disk seek(very common)
  • CPU(but might really be memory)
  • Disk throughput(rare for web apps)

41. Hardware decisions

  • Resource ratios
  • Combining vs. splitting functions
  • Big vs. little boxes

42. Techniques

  • Caching
  • Partitioning
  • Replication
  • Data management middleware
  • Queuing

43. Caching

  • Turn expensive operations in to cheap ones
  • Reduce:
    • Database reads
    • Object and page calculation/rendering operations
  • Cache objects and subobjects
  • Add memory

44. Apache Caching

  • Can be done with zero application modifications
  • Complete pages/HTTP requests only
  • Must use Apache 2.2
  • Cache is not sha