m. tiwari, b. agrawal, s. mysore, j. valamehr, t. sherwood, cs & ece of ucsb reading group...
TRANSCRIPT
M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood,CS & ECE of UCSB
Reading Group Presentation by Theo
DIFT Lifeguards very interesting◦ TaintCheck◦ MemCheck
Can help detect a series of bugs or extract useful information for the program running
Hardware Accelerators used to achieve reasonable performance
All hardware approaches so far use “normal” cache for metadata storage◦ In the normal cache hierarchy◦ Or in extended bits (RAKSHA)◦ Or in dedicated L1-T (Flexitaint)
Conventional approaches very effective for 1- or 2-bit states
But what about word/word or even word/byte lifeguards?
Word/Word◦ Lockset◦ TaintCheck with full tracking / word◦ “Super MemCheck” with alloc/free and NULLing PC
Word/Byte◦ Super MemCheck per byte◦ Tomography Lifeguard
L/G similar to TaintCheck, stores exactly how each input byte was used to calculate each byte in the app
Extended-State L/Gs very useful, but where can we store their state?
Previous caching schemes are ineffective for byte/byte L/Gs◦ Extending cache
lines impractical◦ Using normal cache
will pollute the hierarchy
◦ Dedicated small L1-T will miss frequently
avg
max
Observation◦ Tags exhibit high spatial locality◦ If one byte is tagged as ‘A’, neighboring bytes will
be ‘A’ also
Replace normal cache with range cache
Consecutive addresses with same metadata will only occupy a single entry
AddressAddress MetadataMetadata
From This (L1-T)
Start AddrStart Addr MetadataMetadata
To This (Range Cache)
End AddrEnd Addr
Updates and Reads must be handled fast◦ Especially common case ones (R/W in a single
area)
Regions must be identified on the fly◦ Split, Combine, Increase ranges automatically◦ Extremely important since areas are usually
increased slowly Only few L/Gs (eg AddrCheck) get to know areas
always
Assuming infinite number of entries
0+1→1
1+1→1
1+1→2
1+1→1
1+1→1
1+1→3
N+1→3
2+X+1→3MISS ???
1+1→2
2+1→2
2+X+1→1???
We need index table to detect
internal segments
Not frequent, but not that rare,
handled by H/W state machine
All entries considered dirty. S/W deals with evictions. LRU Replacement
Fast Case: Hit in a single range◦ Return tag for that segment
Medium Case: Multiple ranges, all cached◦ Consecutive ranges must have different tags◦ How to combine? Multiple Solutions:
Reduce algorithm (eg Raksha style rules) Call S/W
Bad Case: One or more segments miss◦ S/W brings 64B segments to cache
Main Memory: 2-Level table with 64B 2nd level segments◦ Reduce and repeat until read is serviced
Double linked list for
detecting internal
segments
3 L/Gs◦ TaintCheck 1-bit/byte◦ MemCheck 2-bit/byte◦ Tomography 32-bit/byte
Apps◦ SPEC, Java App, Store Webserver
Verilog RTL Model◦ 3000 gates for controller of cache
Single issue, in-order CPU model
Maximum number of Tagged Ranges varies greatly: cannot be stored fully in cache◦ Must support swapping
Gcc: Snapshot of 128-entry cache
100/122 < 64B Largest > 2MB
Fixed range-size ill-advised
Everyone spends time on simple read hits and silent updates◦ TaintCheck spends time on “other updates”◦ Other L/Gs have simple hits
TaintCheck 1-bit MemCheck 2-bit Tomography 32-bit
4KB L1-T vs 128 entry Range Cache For Large States Range Cache winner For Small States almost equal
Base=∞ L1-T with 0 misses
L2 misses increased caused by Increased mem refs (previous slide) L2 pollution by tags
Base=∞ L1-T with 0 misses
TaintCheck 1-bit
MemCheck 2-bit
Difference usually minimal between L1-T and Range Cache for small states
Base=∞ L1-T with 0 misses
L/G: 32-bit Tomography
Significant Difference for large States
L1-T is a very simple scheme, easily handled by H/W◦ Misses can be hidden with prefetch
Will have the increase memory pressure, but hide the latency
◦ Prefetch can bypass L2 and bring tags directly to L1
Minimize the L2 pollution Range Cache scheme too complicated for H/W
◦ Must have S/W miss handler or complex H/W walk mechanism
◦ Effect on L1-I and TLB unaccounted for
Interesting approach to exploit the metadata spatial stability with good results◦ Assuming fair comparison
The equivalent of monochromatic-pages only
Multiprocessor consistency quite tricky…
Questions?