![Page 1: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/1.jpg)
1
A Workflow-Aware Storage System
Emalayan Vairavanathan
Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu
![Page 2: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/2.jpg)
2
Workflow Example - ModFTDock
• Protein docking application
• Simulates a more complex protein model from two known proteins
• Applications
Drugs design
Protein interaction prediction
![Page 3: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/3.jpg)
Background – ModFTDock in Argonne BG/P
3
Backend file system (e.g., GPFS, NFS)
Scale: 40960 Compute nodes
File based communication
Large IO volumeWorkflow Runtime
Engine
1.2 M Docking
Tasks
IO rate : 8GBps= 51KBps / core
App. task
Local storage
App. task
Local storage
App. task
Local storage
App. task
Local storage
App. task
Local storage
![Page 4: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/4.jpg)
4Source [Zhao et. al]
Background – Backend Storage Bottleneck
• Storage is one of the main bottlenecks for workflows
Montage workflow (512 BG/P cores, GPFS backend file system)
Data manage-ment30%
Execution29% Scheduling and
Idle 40%
![Page 5: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/5.jpg)
Intermediate Storage Approach
5Backend file system (e.g., GPFS, NFS)
App. task
Local storage
App. task
Local storage
App. task
Local storage
Intermediate Storage
…
POSIX API
Workflow Runtime
EngineScale: 40960 Compute nodes
Stage In
Stage Out
Source [Zhao et. al] MTAGS 2008
![Page 6: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/6.jpg)
6
Research Question
How can we improve the storage performance for workflow applications?
![Page 7: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/7.jpg)
7
IO-Patterns in Workflow Applications – by Justin Wozniak et al PDSW’09
• Pipeline
• Broadcast
• Reduce
• Scatter
and Gather
Locality andlocation-aware scheduling
Replication
Collocation and location-aware scheduling
Block-level data placement
![Page 8: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/8.jpg)
IO-Patterns in ModFTDock
• 1.2 M Dock, 12000 Merge and Score instances at large run• Average file size 100 KB– 75 MB
Stage - 1Broadcast
pattern
Stage - 2Reduce pattern
Stage - 3Pipelinepattern
8
ModFTDock
![Page 9: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/9.jpg)
9
Research Question
How can we improve the storage performance for workflow applications?
Workflow-aware storage: Optimizing the storage for IO patterns
Our Answer
Traditional approach: One size fits allOur approach: File / block-level optimizations
![Page 10: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/10.jpg)
10
Integrating with the workflow runtime engine
Backend file system (e.g., GPFS, NFS)
Workflow Runtime
Engine
App. task
Local storage
App. task
Local storage
App. task
Local storage
Workflow-aware storage (shared)
Compute Nodes
…
Stage In/Out
Storage hints(e.g., location information)
Application hints (e.g., indicating access patterns)
POSIX API
![Page 11: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/11.jpg)
11
Outline
• Background
• IO Patterns
• Workflow-aware storage system: Implementation
• Evaluation
![Page 12: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/12.jpg)
12
Implementation: MosaStore
• File is divided into fixed size chunks.
• Chunks: stored on the storage nodes.
• Manager maintains a block-map for each file
• POSIX interface for accessing the system
MosaStore distributed storage architecture
![Page 13: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/13.jpg)
13
Implementation: Workflow-aware Storage System
Workflow-aware storage architecture
![Page 14: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/14.jpg)
14
Implementation: Workflow-aware Storage System
• Optimized data placement for the pipeline pattern
Priority to local writes and reads
• Optimized data placement for the reduce pattern
Collocating files in a single storage node
• Replication mechanism optimized for the broadcast pattern
Parallel replication
• Exposing file location to workflow runtime engine
![Page 15: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/15.jpg)
15
Outline
• Background
• IO Patterns
• Workflow-aware storage system: Implementation
• Evaluation
![Page 16: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/16.jpg)
16
Evaluation - Baselines
MosaStore, NFS and Node-local storage
vs Workflow-aware storage
Backend file system (e.g., GPFS, NFS)
App. task
Local storage
App. task
Local storage
App. task
Local storage
Intermediate storage (shared)
Compute Nodes
…
Stage In/Out
MosaStore
NFS
Local storage
Workflow-aware storage
![Page 17: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/17.jpg)
17
Evaluation - Platform
• Cluster of 20 machines. Intel Xeon 4-core, 2.33-GHz CPU, 4-GB RAM, 1-Gbps NIC, and a RAID-
1 on two 300-GB 7200-rpm SATA disks
• Backend storage NFS server Intel Xeon E5345 8-core, 2.33-GHz CPU, 8-GB RAM, 1-Gbps NIC, and
a 6 SATA disks in a RAID 5 configuration
NFS server is better provisioned
![Page 18: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/18.jpg)
18
Evaluation – Benchmarks and Application
Synthetic benchmark
Application and workflow run-time engine ModFTDock
Workload Pipeline Broadcast Reduce
Small 100KB, 200KB, 10KB 100KB, 1KB 10KB, 100KB
Medium 100 MB, 200 MB, 1MB 100 MB, 1MB 10MB, 200 MB
Large 1GB, 2GB, 10MB 1 GB, 10 MB 100MB, 2 GB
![Page 19: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/19.jpg)
19
Synthetic Benchmark - Pipeline
Average runtime for medium workload
Optimization: Locality and location-aware scheduling
![Page 20: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/20.jpg)
20
Synthetic Benchmarks - Reduce
Optimization: Collocation and location-aware scheduling
Average runtime for medium workload
![Page 21: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/21.jpg)
Synthetic Benchmarks - Broadcast
21
Optimization: Replication
Average runtime for medium workload
![Page 22: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/22.jpg)
22
Not everything is perfect !
Average runtime for small workload (pipeline, broadcast and reduce benchmarks)
![Page 23: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/23.jpg)
23
Evaluation – ModFTDock
ModFTDock workflow
Total application time on three different systems
![Page 24: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/24.jpg)
24
Evaluation – Highlights
• WASS shows considerable performance gain with all the benchmarks on medium and large workload (up to 18x faster than NFS and up to 2x faster than MosaStore).
• ModFTDock is 20% faster on WASS than on MosaStore, and more than 2x faster than running on NFS.
• WASS provides lower performance with small benchmarks due to metadata overheads and manager latency.
![Page 25: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/25.jpg)
25
Summary
Problem• How can we improve the storage performance for workflow
applications?
Approach• Workflow aware storage system (WASS)
From backend storage to intermediate storage Bi-directional communication using hints
Future work• Integrating more applications• Large scale evaluation
![Page 26: A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu](https://reader030.vdocuments.mx/reader030/viewer/2022032802/56649e055503460f94af2071/html5/thumbnails/26.jpg)
26
THANK YOUMosaStore: netsyslab.ece.ubc.ca/wiki/index.php/MosaStore
Networked Systems Laboratory: netsyslab.ece.ubc.ca