split-level i/o scheduling suli yang, tyler harter, nishant agrawal, samer al-kiswany, salini...
Post on 19-Jan-2016
227 Views
Preview:
TRANSCRIPT
Split-Level I/O Scheduling
Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya,
Anand Krishnamurthy, Rini T Kaushik, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
2
…yet another I/O scheduling paper?
CFQ (2003)
BFQ (2010)
Deadline (2002)
mClock (2011)
Token-Bucket (2008)Libra (2014)
pClock (2007)
Fahrrad (2008)
YFQ (1999)
Facade(2003)
3
Some mistakes we have been making for decades…
(in trying to build better schedulers)
4
• Current frameworks fundamentally limited– CFQ, Deadline, Token-Bucket
• Important policies cannot be realized– Fairness, Latency Guarantee, Isolation
• Wasted effort trying to build new schedulers without fixing the framework
Problem
5
Can we design a simple and effective framework that lets us build schedulers to correctly realize important I/O policies?
6
Solution: Split-Level Framework• Control: Allow scheduling at multiple levels
– Block level– System-call level– Page-cache level
• Information: Tag requests to identify the origin
• Simplicity: Small set of hooks at key junctions within the storage stack
7
Results
• Three distinct policies implemented– Priory, Deadline, Isolation
• Large performance improvements– Fairness: 12x– Tail latency: 4x– Isolation: 6x
• Good foundation for applications– Reduce transaction latency for databases– Improve isolation for virtual machines– Effective rate limit for HDFS
8
Overview
• How I/O scheduling frameworks work
• Split-Level Scheduling Framework: Design
• Split-Level Scheduler Case Study
• Conclusion
9
Framework vs. Scheduler
• Framework: A running environment (mechanism)
• Scheduler: Implement different policies
• How it works Framework provides callbacks to schedulers.
10
Traditional Approach:Block-Level I/O Scheduling
Page Cache
File System
Block-Level Queues
add_req
dispatch_req req_completeBlock-Level Scheduler
App App App
Device
11
Block-Level I/O Scheduling
Simplified Complete Faire Queuing (CFQ) Implementation:
Block-Level Queues
dispatch_req req_completeBlock-Level Scheduler
Device
add_req
add_req(r){ p = r.submit_process q = get_queue(p) enqueue(q,r)}
dispatch_req(){ q = get_high_prio_queue() r = dequeue(q) dispatch(r)}
complete_req(r){//clean up
}
12
Overview
• What is an I/O scheduling framework
• Split-Level Scheduling Framework: Design
– The reordering problem
– The cause-mapping problem
– The cost-estimation problem
• Split-Level Scheduler Case Study
• Conclusion
13
Reordering
Scheduling is just reordering I/O requests
14
File System
Data Entanglement
Block-Level Scheduler
• File system tangles data into one bundle – Journal transaction– Shared metadata block
• Impossible for the schedulers to reorder
App1 App2
15
File System
Write Dependencies
Block-Level Scheduler
• File systems carefully order writes
• Schedulers cannot reorder (unless FS allows)
App
tx1 tx2
16
Fundamental Limitation #1(of block-level scheduling)
• The file system imposes ordering requirements contrary to the scheduling goals
• The scheduler cannot reorder
• Too late once data in the file system
– Need admission control
17
Split-Level I/O Scheduling: Multi-Layer Hooks
Page Cache
File System
Block-Level Queues
add_req
dispatch_req req_complete
Split-Level Scheduler
App App App
Device
write() fsync()avoid data
entanglement and ordering above the file
system
18
Cause Mapping
A scheduler needs to map an I/O request to the originating application
Write Delegation
Page Cache
Block-Level Scheduler
App1 App2
write() write()
Write-back Daemon
Loss of cause information!
Write-back daemon submits
all requests!
• Write-back, journaling, delayed allocation….
20
Fundamental Limitation #2(of block-level scheduling)
• Cause-mapping information lost within the framework
• Impossible to map an I/O request back to its originating application
(no matter how you implement the scheduler)
Split-Level I/O Scheduling: Tags
Page Cache
Block-Level Scheduler
App1 App2
write() write()
Write-back Daemon
Tags to identify origin
Tags pass across layers
1 1 21 1 2
22
Cost Estimation
A scheduler needs to estimate the cost of I/O
– Memory-level notification for timely estimate
– Block-level notification for accurate estimate
– Details in paper
23
Split-Level I/O Scheduling Framework: Summary
• Three key pieces: – Multiple-layer hooks to prevent adverse file system
interaction – Tags to track causes across layers– Early memory-level notification of write work
• Easy Implementation– ~300 LOC in Linux– Little added complexity for building schedulers
24
Overview
• How I/O scheduling frameworks work
• Split-Level Scheduling Framework: Design
• Split-Level Scheduler Case Study
• Conclusion
25
Challenge #1:Priority Scheduler
Fairly allocate I/O resources based on the processes’ priorities
26
Block-Level: CFQ
goal Workload:Eight processes with different priority (0-7), each sequentially writing its own file
add_req(r){ p = r.submit_process q = get_queue(p) enqueue(q,r)}
27
Block-Level: CFQ
the write-back
thread
add_req(r){ p = r.submit_process q = get_queue(p) enqueue(q,r)}
28
Split-Level: AFQ
CFQ deviate from the goal by 82%AFQ by 7% 12x improvement
add_req(r){ p = r.tagged_cause q = get_queue(p) enqueue(q,r)}
29
Challenge #2:Deadline Scheduler
Provide guaranteed latency of I/O requests
Block-Deadline
• Block-Deadline: cannot serve the low-latency requests until previous transaction completed
File System
Block-Deadline
App
tx1 tx2
Block-DeadlineWorkload:
Flush 4KB data to disk with or w/o background writes
Expected Results:
Operation finish within deadline (100ms)
Split-Deadline
• Split-Deadline: suspend write() and fsync() to avoid many high-latency requests to accumulate in one transaction.
File System
Split-Deadline
App
tx1
App
write() fsync()
Write and fsync blocked to prevent high-latency data into FS
Split-Level: Split-Deadline• Split-Deadline maintains the deadline
regardless of background writes.
34
The Fsync-Freeze Problem
During checkpointing, the system begins writing out the data that need to fsync()’d so aggressively that the service time for I/O requests from other
processes go through the roof.
---Robert Hass (PostgreSQL)
35
The Fsync-Freeze Problem
4x tail latency reduction.
Split-Deadline solves the fsync-freeze problem!
Workload: SQLite transaction with different checkpoint interval
Expected Results: Consistent transaction latency
36
Other Evaluation Results
• Low overhead <1% runtime overhead <50 MB memory overhead
• Other schedulers Token-bucket for performance isolation
• Other applications PostgreSQL: latency guarantee for TPC-B workloads QEMU: provides isolation across VMs HDFS: effective I/O rate limit
37
Overview
• What is an I/O scheduling framework and how does it work.
• Split-Level Scheduling Framework: Design
• Split-Level Scheduler Case Study
• Conclusion
38
Conclusion• For decades, people have been trying to
build better block-level schedulers– bound to fail without appropriate framework
support
• Split-level framework enables correct scheduler implementation– Cross-layer tags– Multi-level hooks– Memory-level notification
Source code and more information:
http://research.cs.wisc.edu/adsl/Software/split/
39
BACKUP SLIDES
41
File System
Write Dependencies
App
Block-Level Scheduler
• Modern file system maintains data consistency by carefully ordering writes.
• Schedulers cannot reorder unless file system allows it.
tx1 tx2
42
Split-Level I/O Scheduling: Multi-Layer Hooks
• System-call scheduling above the file system to avoid data entanglement.
• Block-level scheduling below the file system to maximize performance.
Page Cache
App AppApp
read() write() fsync()
File System
write-back
Block-Level Queues
add_req
dispatch_req req_complete
Disk SSD
Scheduler
43
Split-Level I/O Scheduling: Tags
• Write-heavy HDFS workload on a machine with 8GB RAM.
44
Split-Level I/O Scheduling: Tags
• Write-heavy HDFS workload on a machine with 8GB RAM.
45
Split-Level Framework Overhead
I/O performance with noop scheduler:
46
Split-Level I/O Scheduling: Tags
• Write-heavy HDFS workload on a machine with 8GB RAM.
• Worse case memory overhead of tags: 50MB.
47
Block-Level: Windows
48
Performance Isolation
Sequential ReaderUnthrottled
A:
Throttled to 10MB/sB:
49
Real Applications
50
Page Cache
Write DelegationApp1 App2
write() write()
Block-Level Scheduler
write-back
Loss of Cause
Information!
• The process that submitted the block-level requests may not be the process that issued the I/O.
• Write-back, journaling, delayed allocation….
51
Page Cache
Split-Level I/O Scheduling: TagsApp1 App2
write() write()
Block-Level Scheduler
write-back
• Use tags to track I/O request across layers and identify the originating application.
• Tags identify a set of processes responsible for an I/O request.
1 1 21 1 2
52
Myth #1 in I/O Scheduling:
I don’t have to care about I/O scheduling. It is someone else’s problem…
53
• bottleneck of many systems, from phones to servers.
[…our servers appear to freeze for tens of seconds during disk writes…]
• Foundation of performance isolation. […the interference as a result of competing I/Os remains
problematic in a virtualized environment…]
• Pain points for databases, hypervisors, key-value stores and more.
[…one customer reported that just changing cfq to noop solved
their innoDB IO problems…]
Why Is I/O Scheduling Relevant (to You)
54
Myth #1 in I/O Scheduling:
I don’t have to care about I/O scheduling. It is someone else’s problem…
Fact #1:
If you care about performance, you should care about I/O scheduling
55
Myth #2 in I/O Scheduling:
Can’t the disk (or SSD) handle all I/O scheduling?
(Do I still need I/O scheduling in the era of SSD?)
56
• Device powerless when handed the “wrong”
requests from the OS -- file system may withhold requests
• Devices rely on OS-provided information --lack such mechanisms
• Other common reasons: --more contextual information
--OS-level isolation unit --multi-device I/O scheduling
Why Should OS Do I/O Scheduling
58
Myth #2 in I/O Scheduling:
Shouldn’t the disk (or SSD) handle all the I/O scheduling?
Fact #2:
OS has to issue the right request at the right time
top related