1 read-copy update paul e. mckenney linux technology center ibm beaverton jonathan appavoo...
DESCRIPTION
3 Outline Introduce Toy Example Simple Infrastructure to Support RCU ApplicationTRANSCRIPT
1
Read-Copy Update
Paul E. McKenneyLinux Technology CenterIBM [email protected], http://www.rdrop.com/users/paulmckJonathan AppavooDepartment of Electrical and Computer EngineeringUniversity of [email protected] KleenSuSE [email protected] KriegerIBM T. J. Watson Research [email protected], http://www.eecg.toronto.edu/~okriegRusty [email protected] SarmaLinux Technology CenterIBM India Software [email protected] SoniLinux Technology CenterIBM India Software [email protected] Liao,Hsiao-Win
2
Outline Introduce Toy Example Simple Infrastructure to Support RCU Application
3
Outline Introduce Toy Example Simple Infrastructure to Support RCU Application
4
Traditional OS locking designs very complex poor concurrency Fail to take advantage of event-driven
nature of operating systems
5
Race Between Teardown and Use of Service
code executed,Interrupts taken memory error-correction events
6
Read-Copy Update Handling Race
quiescent state
When
7
Read-copy update works best when divide an update into two phases proceed on stale data for common-
case operations (e.g. continuing to handle operations by a module being unloaded)
destructive updates are very infrequent.
8
Implementations ofQuiescent State DYNIX/ptx 2.1 (1993) and Rusty Russell's first wait_for_rcu() patch [Russell01a] simply execute onto each CPU in turn. DYNIX/ptx 4.0 (1994) and Dipankar Sarma's RCU patch for Linux use context switch, execution in the idle loop, execution in user mode, system call entry, trap from user mode, and CPU offline (this last for DYNIX/ptx only) as the quiescent states.
9
Implementations ofQuiescent State Rusty Russell's second wait_for_rcu() patch [Russell01b] uses voluntary context switch as the sole quiescent state Tornado's and K42's "generation" facility tracks beginnings and ends of operations
10
11
Outline Introduce Toy Example Simple Infrastructure to Support RCU Application
12
Reference-count v.s Read-copy search() and delete()
read-copy functions avoid all cacheline bouncing for reading tasks read-copy functions can return references to deleted elements read-copy functions cannot hold a reference to elements across a voluntary context switch
13
Typical RCU update sequence Remove pointers to a data structure. Wait for all previous reader to complete
their RCU read-side critical sections. at this point, there cannot be any readers
who hold reference to the data structure, so it now may safely be reclaimed.
14
Read-Copy Deletion (delete B)
15
the first phase of the update
18
16
Read-Copy Deletion
first
18
17
Read-Copy Search
The Task See Table data
18
Read-Copy Deletion
Second
18
19
Read-Copy Deletion
When
20
Read-Copy Deletion
21
Assumptions Read intensive
the update fraction f < 1/ |CPU| Grace period
reading tasks can see stale data requires that the modification be compatible with lock-free access
linked-list insertion, deletion, and replacement are compatible
22
Outline Introduce Toy Example Simple Infrastructure to Support
RCU Application
23
Simple Implementation Wait_for_rcu()
waits for a grace period to expire Kfree_rcu()
waits for a grace period before freeing a specified block of memory.
24
Read-Copy Update Grace Period
non-preemptible kernel execution Quiescentstate execution
25
Simple Grace-Period Detection
26
Rusty Russell's wait_for_rcu() I
27
Rusty Russell's wait_for_rcu() II
28
Shortcomings Not work in a preemptible kernel unless preemption is suppressed in all read-side critical sections Not be called from an interrupt handler Not be called while holding a spinlock or with interrupts disabled Relatively slow
29
Addressing The K42 and Tornado implementations of RCU are such that read-side critical sections can block as well as being preempted—solve 1 Call_rcu() --solve 2 、 3 Kfree_rcu() --solve 2 、 3 High-Performance Design for RCU –solve 2 、 3 、 4
30
K42 and Tornado implementations of RCU maintain two generation counters
current generation non-current generation
Operations (next page)
31
Operation A Operation begins
increment the current counter store a pointer to that counter in the task
the operation ends Decrement generation counter
Periodically, non-current generation is checked to see if it is zero
Reverse current and non-current generations A token is handed from one CPU to next The token returns to a given CPU
All operations across the entire system have terminated.
32
Non-Blocking Grace-Period Detection
Queues callbacks onto alist
invoke all the pending callbacks after forcinga grace period
33
High-Performance Design defer frees of kmem_cache_alloc() memory detects and identifies overly long lock-hold durations “Batching" grace-period-measurement requests Maintaining per-CPU request lists Providing a less-costly algorithm for measuring grace-period duration.
34
Simple Deferred Free a simple implementation of a deferred-free function named kfree_rcu() low performance
kfree_rcu()→wait for rcu()
35
Outline Introduce Toy Example Simple Infrastructure to Support RCU Application
36
Application Distributed lock manager TCP/IP Storage-area network (SAN) Application regions manager (which is a
workload-management subsystem) Process management LAN drivers
37
Thanks for your listening