a study of caching in parallel file systems dissertation proposal brad settlemyer
TRANSCRIPT
![Page 1: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/1.jpg)
A Study of Caching in Parallel File Systems
Dissertation Proposal
Brad Settlemyer
![Page 2: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/2.jpg)
2
Trends in Scientific Research
• Scientific inquiry is now information intensive– Astronomy, Biology, Chemistry, Climatology, Particle
Physics – all utilize massive data sets
• Data sets under study are often very large– Genomics Databases (50 TB and growing)– Large Hadron Collider (15 PB/yr)
• Time spent manipulating data often exceeds time spent performing calculations– Checkpointing I/O demands are particularly
problematic
![Page 3: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/3.jpg)
3
Typical Scientific Workflow
1. Acquire data• Observational Data (sensor-based, telescope, etc.)• Information Data (gene sequences, protein folding)
2. Stage/Reorganize data to fast file system• Archive retrieval• Filtering extraneous data
3. Process data (e.g. Feature Extraction)4. Output results data5. Reorganize data for visualization6. Visualize Data
![Page 4: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/4.jpg)
4
Trends in Supercomputing
• CPU performance is increasing faster than disk performance– Multicore CPUs and increased intra-node parallelism
• Main memories are large– 4GB cost < $100.00
• Networks are fast and wide– >10Gb network and buses available
• Num Application Processes is increasing rapidly– RoadRunner > 128K concurrent processes achieving
>1 Petaflop– BlueGene/P > 250K concurrent processes achieving
>1 Petaflop
![Page 5: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/5.jpg)
5
I/O Bottleneck
• Application processes are able to construct I/O requests faster than the storage system can provide service
• Applications are unable to fully utilize the massive amounts of available computing power
![Page 6: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/6.jpg)
6
Parallel File Systems
• Addresses I/O bottleneck by providing simultaneous access to large number of disks
Switched Network
I/ONodes
CPUNodes
Process 0
PFSServer 0
Process 1 Process 2 Process 3
PFSServer 3
PFSServer 2
PFSServer 1
![Page 7: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/7.jpg)
7
PFS Data Distribution
PFSServer 0
PFSServer 3
PFSServer 2
PFSServer 1
Strip AStrip BStrip CStrip DStrip EStrip F
Logical File Data
Physical Data Locations
Strip AStrip E
Strip BStrip F
Strip DStrip C
![Page 8: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/8.jpg)
8
Parallel File Systems (cont.)
• Aggregate file system bandwidth requirements largely met– Large, aligned data requests can be rapidly
transferred– Scalable to hundreds of client processes and
improving
• Areas of inadequate performance– Metadata Operations (Create, Remove, Stat)– Small Files– Unaligned Accesses– Structured I/O
![Page 9: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/9.jpg)
9
Scientific Workflow Performance
1. Acquire or Simulate Data• Primarily limited by physical bandwidth characteristics
2. Move or Reorganize Data for Processing• Often metadata intensive
3. Data Analysis or Reconstruction• Small, unaligned accesses perform poorly
4. Move/Reorganize Data for visualization• May perform poorly (small, unaligned accesses)
5. Visualize Data• Benefits from reorganization
![Page 10: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/10.jpg)
10
Alleviating the I/O bottleneck
• Avoid data reorganization costs– Additional work that does not modify results– Limits use of high level libraries
• Increase contiguity/granularity– Interconnects and parallel file systems are well tuned
for large contiguous file accesses– Limits use of low latency messaging available
between cores
• Improve locality– Avoid device accesses entirely– Difficult to achieve in user applications
![Page 11: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/11.jpg)
11
Benefits of Middleware Caching
• Improves locality– PVFS Acache and Ncache– Improve write-read and read-read accesses
• Small accesses– Can bundle small accesses into compound operation
• Alignment– Can compress accesses by performing aligned
requests
• Transparent to application programmer
![Page 12: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/12.jpg)
12
Proposed Caching Techniques
In order to improve the performance of smalland unaligned file accesses, we propose middleware designed to enhance parallel file systems with the following:
1. Shared, Concurrent Access Caching
2. Progressive Page Granularity Caching
3. MPI File View Caching
![Page 13: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/13.jpg)
13
Shared Caching
• Single data cache per node– Leverages trend toward large numbers of
cores– Improves contiguity of alternating request
patterns
• Concurrent access– Single Reader/Writer– Page locking system
![Page 14: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/14.jpg)
14
File Write Example
Logical File
Process 0 I/O Requests Process 1 I/O Requests
![Page 15: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/15.jpg)
15
File Write Example
Logical File
Process 0 I/O Requests Process 1 I/O Requests
![Page 16: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/16.jpg)
16
File Write Example
Logical File
Process 0 I/O Requests Process 1 I/O Requests
![Page 17: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/17.jpg)
17
File Write Example
Logical File
Process 0 I/O Requests Process 1 I/O Requests
![Page 18: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/18.jpg)
18
File Write Example
Logical File
Process 0 I/O Requests Process 1 I/O Requests
![Page 19: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/19.jpg)
19
File Write w/ Cache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Page 0 Cache Page 2Cache Page 1
![Page 20: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/20.jpg)
20
File Write w/ Cache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Page 0 Cache Page 2Cache Page 1
![Page 21: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/21.jpg)
21
File Write w/ Cache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Page 0 Cache Page 2Cache Page 1
![Page 22: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/22.jpg)
22
File Write w/ Cache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Page 0 Cache Page 2Cache Page 1
![Page 23: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/23.jpg)
23
File Write w/ Cache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Page 0 Cache Page 2Cache Page 1
![Page 24: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/24.jpg)
24
Progressive Page Caching
• Benefits of paged caching– Efficient for the file system– Reduces cache metadata overhead
• Issues with paged caching– Aligned pages may retrieve more data than otherwise
required– Unaligned writes do not cache easily
• Read the remaining page fragment• Do not update cache with small writes
• Progressive paged caching addresses issues while minimizing performance and metadata overhead
![Page 25: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/25.jpg)
25
Unaligned Access Caches
• Accesses are independent and not on page boundaries
• Requires increased cache overhead
• How to organize unaligned data– List I/O Tree– Binary Space Partition Tree
![Page 26: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/26.jpg)
26
Paged Cache OrganizationLogical File
Logical File Logical File Logical File
![Page 27: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/27.jpg)
27
BSP Tree Cache Organization
12
1
4
20
5
Logical File
11
8
![Page 28: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/28.jpg)
28
List I/O Tree Cache Organization
10,2
0,1
2,2
Logical File
5,3
![Page 29: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/29.jpg)
29
Progressive Page OrganizationLogical File
Logical File Logical File Logical File
2,21,3
0,1
2,2
![Page 30: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/30.jpg)
30
View Cache
• MPI provides a more descriptive facility for describing file I/O– Collective I/O– MPI provides file views for describing file subregions
• Use file views as a mechanism for coalescing reads and writes during collective I/O
• How to take the union of multiple views.– Use a heuristic approach to detect structured I/O
![Page 31: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/31.jpg)
31
Collective Read Example
Logical File
Process 0 I/O Requests Process 1 I/O Requests
![Page 32: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/32.jpg)
32
Collective Read Example
Logical File
Process 0 I/O Requests Process 1 I/O Requests
![Page 33: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/33.jpg)
33
Collective Read Example
Logical File
Process 0 I/O Requests Process 1 I/O Requests
![Page 34: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/34.jpg)
34
Collective Read w/ Cache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Block 0 Cache Block 2Cache Block 1
![Page 35: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/35.jpg)
35
Collective Read w/ Cache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Block 0 Cache Block 2Cache Block 1
![Page 36: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/36.jpg)
36
Collective Read w/Cache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Block 0 Cache Block 2Cache Block 1
![Page 37: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/37.jpg)
37
Collective Read w/ Cache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Block 0 Cache Block 2Cache Block 1
![Page 38: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/38.jpg)
38
Collective Read w/ Cache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Block 0 Cache Block 2Cache Block 1
![Page 39: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/39.jpg)
39
Collective Read w/ ViewCache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Block 0 Cache Block 2Cache Block 1
![Page 40: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/40.jpg)
40
Collective Read w/ ViewCache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Block 0 Cache Block 2Cache Block 1
![Page 41: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/41.jpg)
41
Collective Read w/ ViewCache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Block 0 Cache Block 2Cache Block 1
![Page 42: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/42.jpg)
42
Collective Read w/ ViewCache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Block 0 Cache Block 2Cache Block 1
![Page 43: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/43.jpg)
43
Collective Read w/ ViewCache
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Block 0 Cache Block 2Cache Block 1
![Page 44: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/44.jpg)
44
Study Methodology
• Simulation-based study– HECIOS
• Closely modelled on PVFS2 and Linux• 40,000 sloc• Leverages OMNeT++, INET Framework
– Cache Organizations• Core Sharing• Aligned Page access• Unaligned page access
![Page 45: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/45.jpg)
45
HECIOS Overview
HECIOS System Architecture
![Page 46: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/46.jpg)
46
HECIOS Overview (cont.)
HECIOS Main Window
![Page 47: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/47.jpg)
47
HECIOS Overview (cont.)
HECIOS Simulation Top View
![Page 48: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/48.jpg)
48
HECIOS Overview (cont.)
HECIOS Simulation Detailed View
![Page 49: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/49.jpg)
49
Contributions
1. HECIOS, the High End Computing I/O Simulator developed and made available under open source license.
2. Flash I/O and BT-IO traced at large scale and traces now publicly available
3. Rigorous study of caching factors in parallel file system
4. Novel cache designs for unaligned file access and MPI view coalescing
![Page 51: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/51.jpg)
51
Dissertation Schedule
• August – Complete trace parser enhancements. Shared cache impl. Complete trace collection.
• September – Aligned cache sharing study.• October – Unaligned cache sharing study.• November – SigMetrics deadline. View
coalescing cache.• December – Finalize experiments. Finish
writing thesis. Defend thesis.
![Page 52: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/52.jpg)
52
PVFS Scalability
Read and Write Bandwidth Curves for PVFS
![Page 53: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/53.jpg)
53
Shared Caching (cont.)
Logical File
Process 0 I/O Requests Process 1 I/O Requests
Cache Page 0 Cache Page 2Cache Page 1
![Page 54: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/54.jpg)
54
Bandwidth Effects
Write Bandwidth on Adenine (MB/sec)
Num
Clients
PVFS w/
8 IONodes
PVFS w/ Replication
16 IONodes
Percent
Performance
1 10.3 9.8 95.1%
4 28.2 28.7 101.8%
8 43.4 39.8 91.5%
16 43.4 40.3 92.9%
32 50.1 38.2 76.2%
![Page 55: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/55.jpg)
55
Experimental Data Distribution
PFSServer 0
PFSServer 3
PFSServer 2
PFSServer 1
Strip AStrip BStrip CStrip DStrip EStrip F
Logical File Data
Physical Data Locations
Strip AStrip E
Strip BStrip F
Strip DStrip C
Strip AStrip E
Strip D Strip CStrip BStrip F
![Page 56: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/56.jpg)
56
Discussion (cont.)
PFSServer 0
PFSServer 3
PFSServer 2
PFSServer 1
Strip AStrip BStrip CStrip DStrip EStrip F
Logical File Data
Physical Data Locations
Strip AStrip E
Strip BStrip F
Strip DStrip C
Strip AStrip D Strip CStrip F
Strip BStrip E
![Page 57: A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f305503460f94c4a39c/html5/thumbnails/57.jpg)
57
Switched Network
I/ONodes
CPUNodes
Process 0
PFSServer 0
Process 1 Process 2 Process 3
PFSServer 3
PFSServer 2
PFSServer 1