virtual memory primitives for user programs
DESCRIPTION
Virtual Memory Primitives for User Programs. Presentation by David Florey. Overview. This paper provides basic primitives, how there used and the implementation details on various OSs Discuss the various primitives and how they are used (in user level algorithms) - PowerPoint PPT PresentationTRANSCRIPT
CS533 - Concepts of Operating Systems
Virtual Memory Primitives for User Programs
Presentation by David Florey
CS533 - Concepts of Operating Systems
Overview
This paper provides basic primitives, how there used and the implementation details on various OSs
Discuss the various primitives and how they are used (in user level algorithms)
Discuss the performance on various OSs Discuss the ramifications of these uses
(algorithms) on system design
CS533 - Concepts of Operating Systems
The Primitives (VM Services)
TRAPo Facility allowing user level handling of page faults (protection or otherwise)o An event that is raised (in the form of a message or signal from OS)
PROT1o Decreases accessibility of a single pageo A procedure call (via messaging, trap to OS, etc)
PROTNo Decreases accessibility of n pageso A procedure call (via messaging, trap to OS, etc)
UNPROTo Increases the accessibility of a single pageo A procedure call (via messaging, trap to OS, etc)
DIRTYo Returns a set of pages that have been touched since the last call to dirtyo A procedure call (via messaging, trap to OS, etc)
MAP2o Map two different virtual addresses to point to the same physical pageo Each virtual address has its own protection levelo This is in the same address space (not two different processes or tasks or address spaces)o A procedure call (via messaging, trap to OS, etc)
CS533 - Concepts of Operating Systems
VM Service UsageConcurrent Garbage Collection
Stop all threads Divide memory into from-space and to-space Copy all objects reachable from “roots” and registers into
to-space Use PROTN to protect all pages in unscanned area Use MAP2 to allow collector access to all pages while
preventing mutators from accessing the same pages Restart threads As mutator threads attempt to access pages in to-space
that are unscanned, TRAP event:o Stops mutator in its trackso Calls collector, collector scans, forwards and UNPROTs pageo Mutator allowed to continue
At some point this process is restarted and all objects left in from-space are considered garbage and removed
Concurrent Garbage CollectionABCDE
F->LGHIJKL
From-Space To-Space
R1->AR2->F
Registers
BCDEGHIJK
A1F1L1
From-Space To-Space
R1->A1R2->F1
Registers
A1.Data.B (Protection Fault)
Mutator Thread
A1 F1 F1
Collector Thread Forward B->B1
CDEGHIJK
A1F1L1B1
A1.Data.B1 B1
(PROTN)
MAP2
CS533 - Concepts of Operating Systems
VM Service Usage Shared Virtual Memory
Each CPU (or machine) has its own memory and memory mapping manager
Memory mapping managers keep CPU memory consistent with the “shared” memory
When a page is shared, it is marked “read-only” (PROT1) Upon writing this page, a fault occurs in the writing thread
causing TRAP event associated Mapping Manager Mapping Manager uses trap to notify other MMs, which in
turn flush their copy of the page (this mechanism may also be used to get an up-to-date copy of the page)
Page is then marked writable (UNPROT) and written MAP2 is used to allow the trap-handler to access the
protected page while the client cannot TRAP is also used by MM to pull down a page from another
CPU or disk when not available locally
Shared Memory
C
A B C D E F
CPU1 CPU2
A C E B C F
MappingManager
MappingManager
Protected with PROT1
CPU1
A C E
MappingManager
Threadattempts to
write CTRAP
Mapping manager trapswrite fault and tells other
mapping managers toflush their copies
CPU2
B F
MappingManager
A B C D E F
Access to Callowed because of
MAP2
Thread resumed andallowed to write to C
CPU1
A C E
MappingManager
CPU2
B F
MappingManager
A B D E F
If thread in CPU2needs C, page faulthandled by Mapping
Manager whichretrieves up-to-date C
from CPU1
CS533 - Concepts of Operating Systems
VM Service Usage Concurrent Checkpointing
Checkpointing is the process of state such as heap, stack, etc – which can be slow
Instead of a synchronous save, we can simply use PROTN to mark the pages that need to be saved to disk read-only
A second thread can then run concurrently with the user threads writing out pages and UNPROTing each page as its written
If a user thread hits a “read-only” page, a fault occurs TRAPping to the concurrent thread which quickly writes the page and allows the faulting thread to continue
Could also just do this with the DIRTY pages using PROT1
CS533 - Concepts of Operating Systems
Concurrent Checkpointing
Concurrent Checkpointing With DIRTY
1 2 3 4
1 Use DIRTY to see which pages have been modified
2
1 2 3 4
Pages 1, 2, and 4 are dirty so PROT1 each of these, then executeoriginal algorithm
CS533 - Concepts of Operating Systems
VM Service Usage Generational Garbage collection
Objects are kept in generations The longer an object lives, the older its generation Typically garbage is in younger generations, but an old
object might be pointing at a young object so… Use DIRTY checkpointing to see if pages containing old
objects were changed, objects in these DIRTY pages can be scanned to see where they point
Or PROTN all old pages and TRAP to a handler when old page is
written to, save page id in a list for later scanning and UNPROT page so writer can write
Later, collector can scan the list of pages to see if any objects within the pages are pointing to younger generations
Why use a small page size here?
Generation X Generation Y
The Problem (in red), older generationpointing to object in younger
generation, which means we can’tcollect that object
1 Use DIRTYPretend each generation is in its own page for now
Page 1
Generation X Generation Y
Page 2
Time t1 (nothingDIRTY)
Page 1
Generation X Generation Y
Page 2
Time t2 (p1 is dirty)
At t3, collector kicks in and scans the listof dirty pages for reverse dependencies
2 USE PROTN, TRAP to create a list of modified pages
Page 1
Generation X Generation Y
Page 2
Time t2 - PROTN all GenXpages
Page 1
Generation X Generation Y
Page 2
Time t2 (p1 is being written to, Add to a list andallow the write to continue (UNPROT page)
At t3, collector kicks in and scans theMANUALLY created list of written pages
CS533 - Concepts of Operating Systems
VM Service UsageOthers…
Persistent Storeso Can use VM services to protect pages, trap on writes and persist dirty pages on
commit or toss them on aborto TRAP, UNPROT and PROTN, UNPROT, MAP2
Extending addressabilityo After translating 64-bit32-bit pages may need to be protected so that a TRAP
handler can properly “load” the page for suitable access, then UNPROT ito TRAP, UNPROT, PROT1 or PROTN and MAP2
Data-compression Pagingo Compressing n pages into a couple of pages may be faster than writing these
pages to disk. The compressed pages can then be access-protected. When user then tries to access such a page, TRAP, decompress, UNPROT
o Could also use PROT1 to test access frequency of pageo TRAP, PROT1 or PROTN, TRAP, UNPROT
Heap overflow detectiono Terminate memory allocation with a “guard” (PROT1) pageo Upon access to this page call TRAP-handler which triggers collectoro Alternative is conditional brancho PROT1, TRAP
Persistent Store Example& Data Compression Example
Page 1 Page 2
A
D C
B
Page 1 and 2 are PROTNinstead of copy-on-wrtie
TRAP
1a) Save A on Commit1b) Save C on Commit
2a, b) UNPROT Pageand allow write
Page 1
ddd
Page 2
qqq
Page 3
rrr
Compressed paged
dddqqqrrr
Page 1
ddd
TRAP
Compressed paged
dddqqqrrr
Page 1
ddd
Page 2
qqq
Page 3
rrr
Resume
CS533 - Concepts of Operating Systems
Performance in OSs
Devised Appel1 and Appel2 based on algorithms’ patterns of primitive usage
Appel1o PROT1, TRAP, UNPROTo e.g. Shared Virtual Memory
Appel2o PROTN, TRAP, UNPROTo e.g. Concurrent garbage collection,
CS533 - Concepts of Operating Systems
Performance in OSs
CS533 - Concepts of Operating Systems
Performance of Primitives
All data normalized based on speed of Add instruction on CPU
Some OSs didn’t implement Map2 Some OSs did a crummy job of implementing
these primitiveso mprotect does not flush the TLB correctly
OS designers seem to be relying on old notions like disk latency
o Not relevant with CPU-based algorithms like these One OS performed exceptionally well showing
that these instructions don’t have to perform poorly
CS533 - Concepts of Operating Systems
Ramifications on System Design
Fault handling must be fast because we are no longer at the mercy of the disk – we can do it all in the CPU
TLB Consistencyo Making memory more accessible is good for TLB
consistency• One less thing you need to worry about
o Making memory less accessible in the multi-processor case forces TLB “shootdown”
• Stop all processors and tell each to flush entry 123 in TLB• Better if done in batches• In fact, paging out could improve if done in batches too
CS533 - Concepts of Operating Systems
Ramifications on System Design
Optimal Page Sizeo Some operations depend on the size of the page
• “HEY OS DESIGNERS LISTEN UP!”o Disk latency can no longer be counted on for crummy
designo Computations linearly proportional to page size are now
going to be noticed, so we might benefit by cutting down the page size
• Those algorithms that do a lot of scanning – like the Generational Garbage collector – would benefit from a smaller page size
o Also be aware that shrinking page sizes will cause more page faults and more calls to the fault trap handler, so its overhead must also be very small
CS533 - Concepts of Operating Systems
Ramifications on System Design
Access to Protected Pageso Mapping same page two different ways with two
different protections in same address space is FAST• Although it does add some bookkeeping overhead• And cache consistency could be a problem
o You could achieve the same results by copying memory around – only 65 copies and you’re there!
• Or pounding your head on the desk – that works tooo You could also use a heavyweight process and super
heavy RPC to context switch heavily, relying on the shared page between processes support in OSs
• Techniques employeed in LRPC and URPC can alleviate the context switch problem
CS533 - Concepts of Operating Systems
Ramifications on System Design
What about pipelined processors?o Out-of-order executiono Dependence on sequential executiono Only a problem in the heap overflow detection case
• Register tweaking can be a problem• All other algorithms work just like a typical page fault
handler – handle fault, pull page in, make page accessible
CS533 - Concepts of Operating Systems
Final Considerations
Making memory more accessible one page at a time, and less accessible in large batches is good for TLB consistency
The total performance effect of page size should be considered (fixed costs vs variable costs)
Locality of reference is exploited in these algorithms
o Better locality improves fault handling overhead (as data is closer to CPU)
Pages should be accessible in different ways in a single address space