an efficient threading model to boost server performance anupam chanda
DESCRIPTION
Thesis Contributions Examine thread architectures – User thread per kernel thread – Blocking I/O vs. non-blocking I/O N-to-M threads with non-blocking I/O – Novel thread model – Architectural benefits over other thread models Higher performance for Apache and MySQLTRANSCRIPT
An Efficient Threading Model An Efficient Threading Model to Boost Server Performanceto Boost Server Performance
Anupam Chanda
MotivationMotivation
Complex mainstream servers are multi-threaded– Apache 2.0– MySQL
Variety of threading models– Effects on server performance?
Want higher performance
Thesis ContributionsThesis ContributionsExamine thread architectures
– User thread per kernel thread– Blocking I/O vs. non-blocking I/O
N-to-M threads with non-blocking I/O– Novel thread model– Architectural benefits over other thread models
Higher performance for Apache and MySQL
Talk OutlineTalk Outline
Contrast threading architectures Benefits of N-to-M threads with non-blocking I/O Large I/O transfer optimization Evaluation
– Apache– MySQL
Related works Conclusion
User Thread / Kernel ThreadUser Thread / Kernel Thread
User
Kernel
1-to-1 N-to-M N-to-1
User
Kernel
Blocking I/O / Non-blocking Blocking I/O / Non-blocking I/OI/O
Blocking I/O– Issue application I/O “as is”– I/O blocks => thread blocks
Non-blocking I/O– Issue application blocking I/O in non-blocking
manner– Use event notification mechanism– Library schedules I/O for different threads– Return to application when I/O finishes
Threading ModelsThreading Models
User thds/kernel thd
Blocking I/O Non-blocking I/O
1-to-1 X -
N-to-1 - X
N-to-M X X*
X => feasible- => not feasible* => novel
1-to-1 threads/blocking I/O1-to-1 threads/blocking I/O
Context switches increase for I/O intensive workloads
Kernel level context switches
N-to-1 threads/non-blocking N-to-1 threads/non-blocking I/OI/O
Block due to page faults, or open()sCannot use multiple processors on an SMPEvent notification
– Select()/poll() don’t scale well
N-to-M threads/blocking I/ON-to-M threads/blocking I/O
Employs scheduler activations to handle blocking events
Blocking I/O => context switch overheadFrequent blocking I/O => reduces to 1-to-1
threads
Non-blocking I/ONon-blocking I/O
Blocking I/O Non-blocking I/O
N-to-M threads/non-blocking N-to-M threads/non-blocking I/OI/O
Compared to 1-to-1 threads/blocking I/O– Fewer kernel threads– Library context switches – less expensive– Non-blocking I/O allows batching of events
across user/kernel boundary
N-to-M threads/asynchronous N-to-M threads/asynchronous I/O (contd.)I/O (contd.)
Compared to N-to-M threads/blocking I/O– Non-blocking I/O allows batching of events
across user/kernel boundaryCompared to N-to-1 threads/non-blocking
I/O– A kernel thread per CPU on an SMP– Does not stall in case of page faults
Large I/O in Traditional LibrariesLarge I/O in Traditional Libraries
REPEAT
ServLib: Large I/O OptimizationServLib: Large I/O Optimization
REPEAT
ServLib Thread LibraryServLib Thread Library
N-to-M threads/non-blocking I/OExports POSIX threads (pthreads) API
– Transparently linked to multi-threaded serversEmploys FreeBSD’s kevent() event
notification mechanism
Performance EvaluationPerformance Evaluation
Compare ServLib with– N-to-1 threads/non-blocking I/O (libc_r)– 1-to-1 threads/blocking I/O (linuxthreads)
Two server applications– Apache web server (version 2.0.43)
Synthetic workload Trace based workload
– MySQL database server (version 3.23.55) TPC-W workload
Apache: Synthetic WorkloadApache: Synthetic Workload
Synthetic Workload– Concurrent clients requesting the same file– Vary file size
Hardware– 2.4 GHz Intel Xeon server– 2 GB memory– 2x Gigabit network connection between server and
client Server CPU bottleneck in these tests
AnalysisAnalysisCollected kernel profile statistics1-to-1 threads
– 40x more context switches than ServLib– Effect of I/O optimization in ServLib
N-to-1 threads– Effect of I/O optimization in ServLib– Poll() – 4th most costly system call– Kevent() – inexpensive
Apache: I/O Optimization TestApache: I/O Optimization Test
Experiment on large I/O optimization– Turn off optimization
5% reduction in overall performance
Apache: Trace Based Apache: Trace Based WorkloadWorkload
Trace based workload– Rice CS trace, NASA trace
Play trace log from client machineIgnore the first runCollect results for second run (warm cache)Working set size less than main memory
Traces: CharacteristicsTraces: Characteristics
Web trace
Total Small (0,8K]
Medium (8K,256K]
Large (>256K)
CS 8 Gbytes 5.5% 20.2% 74.3%
Nasa 89.5 Gbytes
3.1% 24.6% 72.3%
MySQL TestsMySQL Tests
Trace of database queries for TPC-W workload
Database size – ~400 MBServer CPU bottleneck in these tests
MySQL Tests: AnalysisMySQL Tests: Analysis
Collected kernel profile statistics1-to-1 threads
– 3x more context switches than ServLib– Kernel level synchronization more expensive
N-to-1 threads– 20x more poll() than ServLib– 7x more poll() than 1-to-1 threads
Future WorkFuture Work
Investigate effects of preemptionExperiments
– Tests on an SMP– N-to-M threads with blocking I/O– Optimize N-to-1 threads to use kevent()
Related Works: Server Related Works: Server ArchitecturesArchitectures
Flash web server [USENIX 1999]– Hybrid architecture
Staged Event Driven Architecture [SOSP 2001]– QoS for internet services
Related Works: Thread Related Works: Thread LibrariesLibraries
State Threads– N-to-1 thread library– Not pthreads compatible– For Internet server applications
Gnu Pth– N-to-1 thread library– Not pthreads compatible– Threads for event-driven applications
Solaris’s N-to-M threads with blocking I/O Linux’s 1-to-1 threads with blocking I/O
ConclusionsConclusions
N-to-M threads with non-blocking I/O– Novel– High performance
Boost server performance– 10-20% for Apache– 10-15% for MySQL
Thank You!Thank You!