psweep: a lightweight pattern for distributed computational experiments christopher mueller and...
TRANSCRIPT
![Page 1: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/1.jpg)
PSWEEP: A Lightweight Pattern for Distributed Computational Experiments
Christopher Mueller and Andrew Lumsdaine
Open Systems Lab, Indiana University
![Page 2: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/2.jpg)
Introduction
Parameter Sweeps are common cluster applications
Approaches Scripts (sh, perl: ssh, mpi) Low level applications (C++, Fortran: MPI) Parameter sweep applications (e.g., Nimrod)
Problems Custom solutions become tangled quickly Applications are not available on all platforms
![Page 3: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/3.jpg)
How do we use our clusters?Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----882576.aviss.av silin iq SL_DBJ014Q 14636 2 4 -- 200:0 R 109:4890917.aviss.av baikgrp bg DA_NPJ001V 27673 1 2 -- 168:0 R 83:32890932.aviss.av baikgrp bg DA_NPJ002V 18006 1 2 -- 168:0 R 87:31959929.aviss.av rllord iq RL1_NCQ02V 11982 1 2 -- 120:0 R 56:27960044.aviss.av shawnli bg Hairy2b 13703 1 1 -- 100:0 R 42:52960045.aviss.av shawnli bg Xxbp1 21294 1 1 -- 100:0 R 42:51960046.aviss.av shawnli bg Foxa1 15908 1 1 -- 100:0 R 42:49960047.aviss.av shawnli bg Foxa2 19881 1 1 -- 100:0 R 42:49960048.aviss.av shawnli bg Foxd3 19073 1 1 -- 100:0 R 42:49960050.aviss.av shawnli bg Gsc 20886 1 1 -- 100:0 R 42:04960215.aviss.av shawnli bg Foxa1mamma 18296 1 1 -- 100:0 R 35:23960216.aviss.av shawnli bg Foxa2mamma 14926 1 1 -- 100:0 R 34:43960217.aviss.av shawnli bg Foxd3mamma 15016 1 1 -- 100:0 R 34:43960218.aviss.av shawnli bg Gata4mamma 7421 1 1 -- 100:0 R 33:11960220.aviss.av shawnli bg Glimammal 7525 1 1 -- 100:0 R 33:11960221.aviss.av shawnli bg Gscmammal 16626 1 1 -- 100:0 R 33:03960222.aviss.av shawnli bg Hairy2bmam 16760 1 1 -- 100:0 R 33:03960224.aviss.av shawnli bg Hoxd1mamma 32101 1 1 -- 100:0 R 33:01960225.aviss.av shawnli bg Mixermamma 27958 1 1 -- 100:0 R 32:09960279.aviss.av dkberry mdgrape run13_07m 5570 1 1 -- 36:00 R 17:04960283.aviss.av dbaronia iq batch.sh 23862 3 6 -- 24:00 R 22:41960426.aviss.av cwillenb bg CWOA_005 18980 1 1 -- 100:0 R 04:52960428.aviss.av cwillenb bg CWOA_006a 1941 1 1 -- 100:0 R 04:52960429.aviss.av cwillenb bg CWOA_007 -- 1 1 -- 100:0 Q -- 960430.aviss.av cwillenb bg CWOA_008 -- 1 1 -- 100:0 Q -- 960431.aviss.av cwillenb bg CWOA_009 -- 1 1 -- 100:0 Q -- 960432.aviss.av cwillenb bg CWOA_010 -- 1 1 -- 100:0 Q -- 960433.aviss.av cwillenb bg CWOA_011 -- 1 1 -- 100:0 Q -- 960434.aviss.av cwillenb bg CWOA_012 -- 1 1 -- 100:0 Q -- 963115.aviss.av xsong bg par.241 -- 8 16 -- 24:00 Q -- 963116.aviss.av xsong bg par.242 -- 8 16 -- 24:00 Q -- 963121.aviss.av xsong bg par.53.7 -- 8 16 -- 02:00 Q -- 963122.aviss.av xsong bg par.53.8 -- 16 32 -- 02:00 Q -- 963133.aviss.av honfan iq HF_MJ370 23299 3 6 -- 120:0 R 07:13963167.aviss.av whpitcoc iq WP_C572_L0 30829 1 2 -- 24:00 R 01:11963171.aviss.av whpitcoc iq WP_C572_L0 17995 1 2 -- 24:00 R 01:11963186.aviss.av whpitcoc iq WP_C572_TS 5235 1 2 -- 24:00 R 00:08963187.aviss.av whpitcoc iq WP_C572_TS 25746 1 2 -- 24:00 R 00:09963188.aviss.av whpitcoc iq WP_C572_TS 13846 1 2 -- 24:00 R 00:09963189.aviss.av whpitcoc iq WP_C572_TS 26613 1 2 -- 24:00 R 00:08
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----882576.aviss.av silin iq SL_DBJ014Q 14636 2 4 -- 200:0 R 109:4890917.aviss.av baikgrp bg DA_NPJ001V 27673 1 2 -- 168:0 R 83:32890932.aviss.av baikgrp bg DA_NPJ002V 18006 1 2 -- 168:0 R 87:31959929.aviss.av rllord iq RL1_NCQ02V 11982 1 2 -- 120:0 R 56:27960044.aviss.av shawnli bg Hairy2b 13703 1 1 -- 100:0 R 42:52960045.aviss.av shawnli bg Xxbp1 21294 1 1 -- 100:0 R 42:51960046.aviss.av shawnli bg Foxa1 15908 1 1 -- 100:0 R 42:49960047.aviss.av shawnli bg Foxa2 19881 1 1 -- 100:0 R 42:49960048.aviss.av shawnli bg Foxd3 19073 1 1 -- 100:0 R 42:49960050.aviss.av shawnli bg Gsc 20886 1 1 -- 100:0 R 42:04960215.aviss.av shawnli bg Foxa1mamma 18296 1 1 -- 100:0 R 35:23960216.aviss.av shawnli bg Foxa2mamma 14926 1 1 -- 100:0 R 34:43960217.aviss.av shawnli bg Foxd3mamma 15016 1 1 -- 100:0 R 34:43960218.aviss.av shawnli bg Gata4mamma 7421 1 1 -- 100:0 R 33:11960220.aviss.av shawnli bg Glimammal 7525 1 1 -- 100:0 R 33:11960221.aviss.av shawnli bg Gscmammal 16626 1 1 -- 100:0 R 33:03960222.aviss.av shawnli bg Hairy2bmam 16760 1 1 -- 100:0 R 33:03960224.aviss.av shawnli bg Hoxd1mamma 32101 1 1 -- 100:0 R 33:01960225.aviss.av shawnli bg Mixermamma 27958 1 1 -- 100:0 R 32:09960279.aviss.av dkberry mdgrape run13_07m 5570 1 1 -- 36:00 R 17:04960283.aviss.av dbaronia iq batch.sh 23862 3 6 -- 24:00 R 22:41960426.aviss.av cwillenb bg CWOA_005 18980 1 1 -- 100:0 R 04:52960428.aviss.av cwillenb bg CWOA_006a 1941 1 1 -- 100:0 R 04:52960429.aviss.av cwillenb bg CWOA_007 -- 1 1 -- 100:0 Q -- 960430.aviss.av cwillenb bg CWOA_008 -- 1 1 -- 100:0 Q -- 960431.aviss.av cwillenb bg CWOA_009 -- 1 1 -- 100:0 Q -- 960432.aviss.av cwillenb bg CWOA_010 -- 1 1 -- 100:0 Q -- 960433.aviss.av cwillenb bg CWOA_011 -- 1 1 -- 100:0 Q -- 960434.aviss.av cwillenb bg CWOA_012 -- 1 1 -- 100:0 Q -- 963115.aviss.av xsong bg par.241 -- 8 16 -- 24:00 Q -- 963116.aviss.av xsong bg par.242 -- 8 16 -- 24:00 Q -- 963121.aviss.av xsong bg par.53.7 -- 8 16 -- 02:00 Q -- 963122.aviss.av xsong bg par.53.8 -- 16 32 -- 02:00 Q -- 963133.aviss.av honfan iq HF_MJ370 23299 3 6 -- 120:0 R 07:13963167.aviss.av whpitcoc iq WP_C572_L0 30829 1 2 -- 24:00 R 01:11963171.aviss.av whpitcoc iq WP_C572_L0 17995 1 2 -- 24:00 R 01:11963186.aviss.av whpitcoc iq WP_C572_TS 5235 1 2 -- 24:00 R 00:08963187.aviss.av whpitcoc iq WP_C572_TS 25746 1 2 -- 24:00 R 00:09963188.aviss.av whpitcoc iq WP_C572_TS 13846 1 2 -- 24:00 R 00:09963189.aviss.av whpitcoc iq WP_C572_TS 26613 1 2 -- 24:00 R 00:08
![Page 4: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/4.jpg)
Anatomy of a Parameter Sweep
1. for i in range(rank, n, size):2. if process: load_image(i)3. elif stats: query_image(i)4. 5. for j in [1, 2, 4, 8]:6. if process: time(i, j)7. 8. for k in [‘motion’, ‘gaussian’]:9. if process: process_image(i,j,k)10. elif stats: image_stats(i,j,k)11. else:12. print 'ssh n%d run %d %d' % (i, j, k)13. 14. if process: clear_process(k)15. elif bgi: clear_temp(k)16. 17. if process: unload_image(i)
Parameters and Enumeration Order
*
* Resrouce distribution is handled by the execution enviroment, e.g. mpirun
![Page 5: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/5.jpg)
Anatomy of a Parameter Sweep
Tasks and Experiments
1. for i in range(rank, n, size):2. if process: load_image(i)3. elif stats: query_image(i)4. 5. for j in [1, 2, 4, 8]:6. if process: time(i, j)7. 8. for k in [‘motion’, ‘gaussian’]:9. if process: process_image(i,j,k)10. elif stats: image_stats(i,j,k)11. else:12. print 'ssh n%d run %d %d' % (i, j, k)13. 14. if process: clear_process(k)15. elif bgi: clear_temp(k)16. 17. if process: unload_image(i)
![Page 6: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/6.jpg)
Anatomy of a Parameter Sweep
Artifacts and Errors
1. for i in range(rank, n, size):2. if process: load_image(i)3. elif stats: query_image(i)4. 5. for j in [1, 2, 4, 8]:6. if process: time(i, j)7. 8. for k in [‘motion’, ‘gaussian’]:9. if process: process_image(i,j,k)10. elif stats: image_stats(i,j,k)11. else:12. print 'ssh n%d run %d %d' % (i, j, k)13. 14. if process: clear_process(k)15. elif bgi: clear_temp(k)16. 17. if process: unload_image(i)
![Page 7: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/7.jpg)
User’s View
process
load_image()unload_image()
time()
process_image()
clear_process()
[0, n]
[.01, .1, 1.0]
[10, 12, 14]
stats
query_image()
image_stats()
script gen
print …0, 0.01, 100, 0.01, 120, 0.01, 140, 0.1, 100, 0.1, 12…
Experiments
Parameters
[i, j, k]
Resources
![Page 8: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/8.jpg)
The PSWEEP Pattern
![Page 9: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/9.jpg)
Abstracting the Loops
Parameter. A Parameter is an iterator or container that supplies the values for a variable in the experiment.
Enumerator. The enumerator takes a ordered list of parameters and lexigraphically enumerates all possible values.
State. The state contains the current value of each parameter, in order.
1. i = [‘house.jpg’, ‘lena.jpg’]2. j = [1, 2, 4, 8]3. K = [‘motion’, ‘gaussian’]4. 5. params = [i, j, k]6. e = enumerator(params)7. 8. for state in e: process_image(state)
![Page 10: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/10.jpg)
Abstracting the Experiments
Task. A Task is any unit of work performed when a parameter value changes. A Task is subdivided into setup and cleanup operations, corresponding to the work done at the beginning and end of a block of code in a loop, respectively.
Experiment. An Experiment is a collection of tasks.
1. def PrepareImage(state, img):2. # Setup3. db_load(img, './current.jpg')4. yield # suspend the function5. # Cleanup6. delete('./current.jpg')
1. def ProcessImage(state, alg):2. data = load('./current.jpg')3. img = process(data, alg(value))4. save(img, str(state) + '.jpg')5. 6. return # no cleanup
![Page 11: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/11.jpg)
Binding Experiments to State
Bound Task Semantics. Tasks must execute in the same order they would if the parameter sweep was expanded to nested loops.
1. for img in images:2. PrepareImage.setup(img)3. for alg in algs:4. ProcessImage.setup(alg)5. PrepareImage.cleanup(img)
1. e = enumerator([images, algs])2. e.bind(images, PrepareImage)3. e.bind(algs, ProcessImage)4. 5. for state in e: pass
These examples are equivalent.
![Page 12: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/12.jpg)
Distributing the Workload
DistributedEnumerator. DistributedEnumerator is an Enumerator that distributes the state to multiple instances across multiple computing resources.
e = RoundRobin(params)for state in e: pass
States:
p1: [house.jpg, 1, motion]p2: [house.jpg, 1, gaussian] [house.jpg, 2, motion] [house.jpg, 2, gaussian] [house.jpg, 4, motion] [house.jpg, 4, gaussian] [lena.jpg, 1, motion] [lena.jpg, 1, gaussian] [lena.jpg, 2, motion] [lena.jpg, 2, gaussian] [lena.jpg, 4, motion] [lena.jpg, 4, gaussian]
e = Domain(params, images)for state in e: pass
States:
p1: [house.jpg, 1, motion] [house.jpg, 1, gaussian] [house.jpg, 2, motion] [house.jpg, 2, gaussian] [house.jpg, 4, motion] [house.jpg, 4, gaussian]p2: [lena.jpg, 1, motion] [lena.jpg, 1, gaussian] [lena.jpg, 2, motion] [lena.jpg, 2, gaussian] [lena.jpg, 4, motion] [lena.jpg, 4, gaussian]
e = MasterWorker(params)for state in e: pass
States:
p1: [house.jpg, 1, motion]p2: [house.jpg, 1, gaussian] [house.jpg, 2, motion] [house.jpg, 2, gaussian] [house.jpg, 4, motion] [house.jpg, 4, gaussian] [lena.jpg, 1, motion] [lena.jpg, 1, gaussian] [lena.jpg, 2, motion] [lena.jpg, 2, gaussian] [lena.jpg, 4, motion] [lena.jpg, 4, gaussian]
The DistributedEnumerators must ensure that bound state semantics are satisfied.
![Page 13: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/13.jpg)
Implementations
Python Designed around Iterators and Generators DistribtedEnumerator based on pyMPI Ideal for managing experiments on clusters
C++ Template metaprogramming techniques
remove abstraction penalties Ideal for applications with many nested loops
![Page 14: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/14.jpg)
C++ Example
1. struct table_task {2. void setup(State& state) {3. std::cout << "<table title=\"";4. print_last_param()(state);5. std::cout << "\">\n";6. }
7. void cleanup(State&) {8. std::cout << "</table>\n";9. }10. };
11. struct table_row_task {12. // As above with <tr>13. };
14. struct table_data_task {15. // As above with <td>16. };
1. int main()2. {3. using boost::make_tuple;
4. sweep(make_tuple("Sat", "Sun"5. make_tuple(range(24)6. make_tuple(range(0,60,10))))7. empty_state().8. bind<0>(table_task()).9. bind<1>(table_row_task()).10. bind<2>(table_data_task()),11. print_last_param());
12. return 0;13. }
Task Classes Parameter Sweep
Generate HTML tables for days of the week with hours for the rows and minutes for the colums
![Page 15: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/15.jpg)
Conclusions
PSWEEP cleanly separates concerns Parameters Tasks Resources
Modern languages enable flexible and high-performance implementations
![Page 16: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/16.jpg)
Reference
http://www.osl.iu.edu/~chemuell/new/psweep.php
A Lightweight Pattern for Managing Distributed Computational Experiments Christopher Mueller, Douglas Gregor, and Andrew Lumsdaine. Submitted to HPDC 2006.
![Page 17: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University](https://reader030.vdocuments.mx/reader030/viewer/2022033104/56649ec05503460f94bcc5ea/html5/thumbnails/17.jpg)
Questions?