m. tiwari, b. agrawal, s. mysore, j. valamehr, t. sherwood, cs & ece of ucsb reading group...

22
M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

Upload: chester-bruce

Post on 18-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood,CS & ECE of UCSB

Reading Group Presentation by Theo

Page 2: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

DIFT Lifeguards very interesting◦ TaintCheck◦ MemCheck

Can help detect a series of bugs or extract useful information for the program running

Hardware Accelerators used to achieve reasonable performance

Page 3: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

All hardware approaches so far use “normal” cache for metadata storage◦ In the normal cache hierarchy◦ Or in extended bits (RAKSHA)◦ Or in dedicated L1-T (Flexitaint)

Conventional approaches very effective for 1- or 2-bit states

But what about word/word or even word/byte lifeguards?

Page 4: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

Word/Word◦ Lockset◦ TaintCheck with full tracking / word◦ “Super MemCheck” with alloc/free and NULLing PC

Word/Byte◦ Super MemCheck per byte◦ Tomography Lifeguard

L/G similar to TaintCheck, stores exactly how each input byte was used to calculate each byte in the app

Extended-State L/Gs very useful, but where can we store their state?

Page 5: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

Previous caching schemes are ineffective for byte/byte L/Gs◦ Extending cache

lines impractical◦ Using normal cache

will pollute the hierarchy

◦ Dedicated small L1-T will miss frequently

avg

max

Page 6: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

Observation◦ Tags exhibit high spatial locality◦ If one byte is tagged as ‘A’, neighboring bytes will

be ‘A’ also

Replace normal cache with range cache

Consecutive addresses with same metadata will only occupy a single entry

AddressAddress MetadataMetadata

From This (L1-T)

Start AddrStart Addr MetadataMetadata

To This (Range Cache)

End AddrEnd Addr

Page 7: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

Updates and Reads must be handled fast◦ Especially common case ones (R/W in a single

area)

Regions must be identified on the fly◦ Split, Combine, Increase ranges automatically◦ Extremely important since areas are usually

increased slowly Only few L/Gs (eg AddrCheck) get to know areas

always

Page 8: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

Assuming infinite number of entries

0+1→1

1+1→1

1+1→2

1+1→1

1+1→1

1+1→3

Page 9: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

N+1→3

2+X+1→3MISS ???

1+1→2

2+1→2

2+X+1→1???

Page 10: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

We need index table to detect

internal segments

Not frequent, but not that rare,

handled by H/W state machine

All entries considered dirty. S/W deals with evictions. LRU Replacement

Page 11: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

Fast Case: Hit in a single range◦ Return tag for that segment

Medium Case: Multiple ranges, all cached◦ Consecutive ranges must have different tags◦ How to combine? Multiple Solutions:

Reduce algorithm (eg Raksha style rules) Call S/W

Bad Case: One or more segments miss◦ S/W brings 64B segments to cache

Main Memory: 2-Level table with 64B 2nd level segments◦ Reduce and repeat until read is serviced

Page 12: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

Double linked list for

detecting internal

segments

Page 13: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

3 L/Gs◦ TaintCheck 1-bit/byte◦ MemCheck 2-bit/byte◦ Tomography 32-bit/byte

Apps◦ SPEC, Java App, Store Webserver

Verilog RTL Model◦ 3000 gates for controller of cache

Single issue, in-order CPU model

Page 14: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

Maximum number of Tagged Ranges varies greatly: cannot be stored fully in cache◦ Must support swapping

Gcc: Snapshot of 128-entry cache

100/122 < 64B Largest > 2MB

Fixed range-size ill-advised

Page 15: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

Everyone spends time on simple read hits and silent updates◦ TaintCheck spends time on “other updates”◦ Other L/Gs have simple hits

TaintCheck 1-bit MemCheck 2-bit Tomography 32-bit

Page 16: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo
Page 17: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

4KB L1-T vs 128 entry Range Cache For Large States Range Cache winner For Small States almost equal

Base=∞ L1-T with 0 misses

Page 18: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

L2 misses increased caused by Increased mem refs (previous slide) L2 pollution by tags

Page 19: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

Base=∞ L1-T with 0 misses

TaintCheck 1-bit

MemCheck 2-bit

Difference usually minimal between L1-T and Range Cache for small states

Page 20: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

Base=∞ L1-T with 0 misses

L/G: 32-bit Tomography

Significant Difference for large States

Page 21: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

L1-T is a very simple scheme, easily handled by H/W◦ Misses can be hidden with prefetch

Will have the increase memory pressure, but hide the latency

◦ Prefetch can bypass L2 and bring tags directly to L1

Minimize the L2 pollution Range Cache scheme too complicated for H/W

◦ Must have S/W miss handler or complex H/W walk mechanism

◦ Effect on L1-I and TLB unaccounted for

Page 22: M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

Interesting approach to exploit the metadata spatial stability with good results◦ Assuming fair comparison

The equivalent of monochromatic-pages only

Multiprocessor consistency quite tricky…

Questions?