introduction data movement is a major bottleneck in data-intensive high performance computing we...

1
Introduction Data movement is a major bottleneck in data-intensive high performance computing We propose a Fusion Active Storage System (FASS) to address the data movement bottleneck issue, specifically for write-intensive big data applications The idea of the proposed FASS is to identify and offload write-intensive operations and carry out these operations on storage nodes FASS enables to move write- intensive computations to storage nodes, generates and writes data right in place to storage FASS: Components The Offload Analysis Module (OAM) will calculate if an operation would perform better if it were offloaded Instruction Decoupling Module (IDM) provides an extended API that would allow the programmer to flag sections of code as write-intensive to determine what to be offloaded The Kernel Processing Module (KPM) carries out the offloaded instructions on the storage nodes and communications as needed Evaluations on DISCFarm Cluster Offload Analysis Module Algorithm We can determine whether to offload the computation or not with a heuristic algorithm Variable Denotations: Wd = data workload Wdi = instruction workload Wc = computation workload N = total Nodes b = bandwidth Cn= Compute node Sn = Storage node If the time saved offloading instructions instead of the entire data is greater than the time lost computing on storage nodes then offload the operations Model: Traditional HPC Variable denotations: T – Execution time m – Number of phases Wc- Computational Workload Wd- Data Workload N – Total Nodes b – Bandwidth Model: FASS Variable denotations: Sn – Storage nodes Cn Computation Nodes Wdi – Instruction Data Workload Mw Write Intensive Phase Mc – Computational Phase T’ – FASS execution time Comparison and Analysis Restrictions: N=Sn+Cn; Sn>0; Cn>0; M=Mw+Mc Constants for all test cases: N = 24; b = 24; Wc = 100; m=10; In this test we observed the effect of varying the % of write phases in the runtime of the FASS. As can be seen from the graph the FASS performed better than the Traditional method when the % write phases exceeded 60%. The FASS speed the runtime up 1.36X at 100% write phases. Constants for test 1: Sn = 12 Cn = 12 Wc = 100 Wd = 200 Wdi = 20 In this test we observed the effect of varying the data workload in the runtime of the FASS. As can be seen from the graph the FASS performed better than the Traditional method when the % write phases exceeded 10%. The FASS sped the runtime up 3.66X at 100% write phases. This test performed better than the first test at speeding up execution times, as FASS is specially useful when Constants for test 2: Sn = 12 Cn = 12 Wc = 100 Wd = 1000 Wdi = 100 Future Work * DISCLAIMER: This material is based upon work supported by the National Science Foundation and the Department of Defense under Grant No. CNS- 1263183. An opinions, findings, and conclusions or recommendation expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Department of Defense. Build a fully functioning and automated FASS prototype and conduct further evaluations Examine the viability of using KPM to detect and act upon data dependencies Conclusion Fusion Active Storage for Write-intensive Big Data Applications* The results of these analyses and evaluations show that the FASS clearly enhances the performance of write-intensive applications Note that the models and emulations are simplified as they do not take into account factors such as data dependencies. Despite the simplified assumptions, we believe the potential of the FASS is promising and would enhance the performance of real-world write-intensive applications References [1] C. Chen, Y. Chen, and P. C. Roth. DOSAS: Mitigating the Resource Contention in Active Storage Systems. In the Proc. of IEEE International Conference on Cluster Computing 2012 (Cluster‘12), 2012. [2] E. J. Felix, K. Fox, K. Regimbal, and J. Nieplocha. Active Storage Processing in a Parallel File System. In 6th LCI International Conference on Linux Clusters: The HPC Revolution, Chapel Hill, North Carolina, 2005. [3] S. W. Son, S. Lang, P. Carns, R. Ross, and R. Thakur. Enabling Active Storage on Parallel I / O Software Stacks. In 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), 2010. We have emulated the FASS system using a 16-node DISCFarm cluster at Texas Tech University to evaluate the benefits. These tests were conducted with a write- intensive random number generator code to measure the potential of the FASS compared to the traditional method As can be observed from the graph, the traditional method impaired the performance. The FASS achieved over 4 times faster at writing 3 million random numbers than the traditional method Greg Thorsness, Chao Chen, and Yong Chen Department of Computer Science, Texas Tech University {greg.thorsness, chao.chen, yong.chen}@ttu.edu

Upload: letitia-christine-cook

Post on 18-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction  Data movement is a major bottleneck in data-intensive high performance computing  We propose a Fusion Active Storage System (FASS) to address

IntroductionData movement is a major bottleneck in

data-intensive high performance computing

We propose a Fusion Active Storage System (FASS) to address the data movement bottleneck issue, specifically for write-intensive big data applications

The idea of the proposed FASS is to identify and offload write-intensive operations and carry out these operations on storage nodes

FASS enables to move write-intensive computations to storage nodes, generates and writes data right in place to storage

FASS: Components The Offload Analysis Module (OAM) will

calculate if an operation would perform better if it were offloaded

Instruction Decoupling Module (IDM) provides an extended API that would allow the programmer to flag sections of code as write-intensive to determine what to be offloaded

The Kernel Processing Module (KPM) carries out the offloaded instructions on the storage nodes and communications as needed

Evaluations on DISCFarm Cluster

Offload Analysis Module AlgorithmWe can determine whether to offload the computation or not with a heuristic algorithm

Variable Denotations:Wd = data workload Wdi = instruction workloadWc = computation workload N = total Nodesb = bandwidth Cn= Compute nodeSn = Storage node

If the time saved offloading instructions instead of the entire data is greater than the time lost computing on storage nodes then offload the operations

Model: Traditional HPC

Variable denotations:T – Execution time m – Number of phasesWc- Computational Workload Wd- Data WorkloadN – Total Nodes b – Bandwidth

Model: FASS

Variable denotations:Sn – Storage nodes Cn – Computation NodesWdi – Instruction Data Workload Mw – Write Intensive PhaseMc – Computational Phase T’ – FASS execution time

Comparison and Analysis

Restrictions: N=Sn+Cn; Sn>0; Cn>0; M=Mw+Mc

Constants for all test cases:N = 24; b = 24; Wc = 100; m=10;

In this test we observed the effect of varying the % of write phases in the runtime of the FASS. As can be seen from the graph the FASS performed better than the Traditional method when the % write phases exceeded 60%. The FASS speed the runtime up 1.36X at 100% write phases.

Constants for test 1:Sn = 12 Cn = 12 Wc = 100 Wd = 200 Wdi = 20

In this test we observed the effect of varying the data workload in the runtime of the FASS. As can be seen from the graph the FASS performed better than the Traditional method when the % write phases exceeded 10%. The FASS sped the runtime up 3.66X at 100% write phases. This test performed better than the first test at speeding up execution times, as FASS is specially useful when dealing with a large volume of data

Constants for test 2:Sn = 12 Cn = 12 Wc = 100 Wd = 1000 Wdi = 100

Future Work

* DISCLAIMER: This material is based upon work supported by the National Science Foundation and the Department of Defense under Grant No. CNS-1263183. An opinions, findings, and conclusions or recommendation expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Department of Defense.

Build a fully functioning and automated FASS prototype and conduct further evaluations

Examine the viability of using KPM to detect and act upon data dependencies

Conclusion

Fusion Active Storage for Write-intensive Big Data Applications*

The results of these analyses and evaluations show that the FASS clearly enhances the performance of write-intensive applications

Note that the models and emulations are simplified as they do not take into account factors such as data dependencies. Despite the simplified assumptions, we believe the potential of the FASS is promising and would enhance the performance of real-world write-intensive applications

References[1] C. Chen, Y. Chen, and P. C. Roth. DOSAS: Mitigating the Resource Contention in Active Storage Systems. In the Proc. of IEEE International Conference on Cluster Computing 2012 (Cluster‘12), 2012. [2] E. J. Felix, K. Fox, K. Regimbal, and J. Nieplocha. Active Storage Processing in a Parallel File System. In 6th LCI International Conference on Linux Clusters: The HPC Revolution, Chapel Hill, North Carolina, 2005.[3] S. W. Son, S. Lang, P. Carns, R. Ross, and R. Thakur. Enabling Active Storage on Parallel I / O Software Stacks. In 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), 2010.

We have emulated the FASS system using a 16-node DISCFarm cluster at Texas Tech University to evaluate the benefits. These tests were conducted with a write-intensive random number generator code to measure the potential of the FASS compared to the traditional method

As can be observed from the graph, the traditional method impaired the performance. The FASS achieved over 4 times faster at writing 3 million random numbers than the traditional method

Greg Thorsness, Chao Chen, and Yong ChenDepartment of Computer Science, Texas Tech University

{greg.thorsness, chao.chen, yong.chen}@ttu.edu