introduction data movement is a major bottleneck in data-intensive high performance computing we...
TRANSCRIPT
IntroductionData movement is a major bottleneck in
data-intensive high performance computing
We propose a Fusion Active Storage System (FASS) to address the data movement bottleneck issue, specifically for write-intensive big data applications
The idea of the proposed FASS is to identify and offload write-intensive operations and carry out these operations on storage nodes
FASS enables to move write-intensive computations to storage nodes, generates and writes data right in place to storage
FASS: Components The Offload Analysis Module (OAM) will
calculate if an operation would perform better if it were offloaded
Instruction Decoupling Module (IDM) provides an extended API that would allow the programmer to flag sections of code as write-intensive to determine what to be offloaded
The Kernel Processing Module (KPM) carries out the offloaded instructions on the storage nodes and communications as needed
Evaluations on DISCFarm Cluster
Offload Analysis Module AlgorithmWe can determine whether to offload the computation or not with a heuristic algorithm
Variable Denotations:Wd = data workload Wdi = instruction workloadWc = computation workload N = total Nodesb = bandwidth Cn= Compute nodeSn = Storage node
If the time saved offloading instructions instead of the entire data is greater than the time lost computing on storage nodes then offload the operations
Model: Traditional HPC
Variable denotations:T – Execution time m – Number of phasesWc- Computational Workload Wd- Data WorkloadN – Total Nodes b – Bandwidth
Model: FASS
Variable denotations:Sn – Storage nodes Cn – Computation NodesWdi – Instruction Data Workload Mw – Write Intensive PhaseMc – Computational Phase T’ – FASS execution time
Comparison and Analysis
Restrictions: N=Sn+Cn; Sn>0; Cn>0; M=Mw+Mc
Constants for all test cases:N = 24; b = 24; Wc = 100; m=10;
In this test we observed the effect of varying the % of write phases in the runtime of the FASS. As can be seen from the graph the FASS performed better than the Traditional method when the % write phases exceeded 60%. The FASS speed the runtime up 1.36X at 100% write phases.
Constants for test 1:Sn = 12 Cn = 12 Wc = 100 Wd = 200 Wdi = 20
In this test we observed the effect of varying the data workload in the runtime of the FASS. As can be seen from the graph the FASS performed better than the Traditional method when the % write phases exceeded 10%. The FASS sped the runtime up 3.66X at 100% write phases. This test performed better than the first test at speeding up execution times, as FASS is specially useful when dealing with a large volume of data
Constants for test 2:Sn = 12 Cn = 12 Wc = 100 Wd = 1000 Wdi = 100
Future Work
* DISCLAIMER: This material is based upon work supported by the National Science Foundation and the Department of Defense under Grant No. CNS-1263183. An opinions, findings, and conclusions or recommendation expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Department of Defense.
Build a fully functioning and automated FASS prototype and conduct further evaluations
Examine the viability of using KPM to detect and act upon data dependencies
Conclusion
Fusion Active Storage for Write-intensive Big Data Applications*
The results of these analyses and evaluations show that the FASS clearly enhances the performance of write-intensive applications
Note that the models and emulations are simplified as they do not take into account factors such as data dependencies. Despite the simplified assumptions, we believe the potential of the FASS is promising and would enhance the performance of real-world write-intensive applications
References[1] C. Chen, Y. Chen, and P. C. Roth. DOSAS: Mitigating the Resource Contention in Active Storage Systems. In the Proc. of IEEE International Conference on Cluster Computing 2012 (Cluster‘12), 2012. [2] E. J. Felix, K. Fox, K. Regimbal, and J. Nieplocha. Active Storage Processing in a Parallel File System. In 6th LCI International Conference on Linux Clusters: The HPC Revolution, Chapel Hill, North Carolina, 2005.[3] S. W. Son, S. Lang, P. Carns, R. Ross, and R. Thakur. Enabling Active Storage on Parallel I / O Software Stacks. In 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), 2010.
We have emulated the FASS system using a 16-node DISCFarm cluster at Texas Tech University to evaluate the benefits. These tests were conducted with a write-intensive random number generator code to measure the potential of the FASS compared to the traditional method
As can be observed from the graph, the traditional method impaired the performance. The FASS achieved over 4 times faster at writing 3 million random numbers than the traditional method
Greg Thorsness, Chao Chen, and Yong ChenDepartment of Computer Science, Texas Tech University
{greg.thorsness, chao.chen, yong.chen}@ttu.edu