small identifiers for chare array elements with contributions from akhil langer, harshitha menon,...
TRANSCRIPT
SMALL IDENTIFIERS FOR CHARE ARRAY ELEMENTSWith contributions from Akhil Langer, Harshitha Menon, Bilge Acun, Ramprasad Venkataraman, and L.V. KaléPhil Miller
Array Element Location Management
Enables process virtualization Directs messages from sender PE to host
PE Maps element identifier to object pointer on
host PE Processes element instantiation &
deletion Placement at creation On-demand creation Detects duplicate insertion
Array Element Location Management
Hooks for RTS introspection and adaptivity Tracing Load instrumentation Migration Fault tolerance
Array Index Structure
Fixed 16 bytes No less, even for
small/simple arrays No more, even for
sophisticated arrays
Home PE
Track assigned elements Existence Current host PE
Default host PE Tell other PEs as necessary Assigned by Array Map
Static Array Maps
Array Index to Home PE Simple strategies
Block Round-robin (cyclic) File
Application-specific OpenAtom CharmLU
Pushing the Envelope
Array message variant takes 34-38 bytes Next largest only needs 18 bytes
Could save 16-20 bytes on every message!
Goals of a shorter ID
Reduce envelope size Shrink memory footprint Improve fine-grain performance Enable future index evolution
Design parameters
Preserve API: Send messages by index Maps and home PEs
Avoid extra communication Maintain or improve performance
Scheme
64 Bits
Protocol
Home PE generates ID at construction Simple counter in element field Async request if constructing elsewhere
ID requests piggy-back on location requests
Extra messages only for unusual construction
Potential Optimizations
PE-level caching & pointer lookups Index compression instead of lookups
Index Compression
Many arrays fit directly within 48-bit space All 1D with 32-bit ‘int’ 2D < (16M)2
Etc. Specify bounds, RTS will bit-pack if they
fit Could also enable hashing
Collisions would be disastrous Known indices and perfect hashing?
Summary
Current Status Implemented, passing all tests Performance
Comparable in coarse-grain apps Slightly slower in fine-grain
Future Direction: arbitrary index types AMR, tree codes