cas@home wenjing wu [email protected] computer center, institute of high energy physics chinese...
TRANSCRIPT
![Page 1: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/1.jpg)
CAS@home
Wenjing [email protected] Center,
Institute of High Energy PhysicsChinese Academy of Sciences, Beijing
23/4/20 BOINC workshop 2013 @Grenoble 1
![Page 2: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/2.jpg)
outline
• CAS@home project• Applications:– Lammps: dynamical molecular simulation– treeThreader: protein structure prediction
• Remote Job Submission
23/4/20 BOINC workshop 2013 @Grenoble 2
![Page 3: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/3.jpg)
CAS@HOME
23/4/20 BOINC workshop 2013 @Grenoble 3
First and Only Volunteer Project in mainland ChinaFirst and Only Volunteer Project in mainland China
Launched in June 2010, hosted by the computer center of IHEP, CAS
Launched in June 2010, hosted by the computer center of IHEP, CAS
To support scientific computing from Chinese Academy of Sciences and other Research Institutes
To support scientific computing from Chinese Academy of Sciences and other Research Institutes
Host multiple applications from various research fields, including nanotechnology, bioinformation, physics
Host multiple applications from various research fields, including nanotechnology, bioinformation, physics
![Page 4: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/4.jpg)
CAS@home status
23/4/20 BOINC workshop 2013 @Grenoble 4
Ever Since it was launched in June 2010Ever Since it was launched in June 2010
10K active users1/3 are Chinese
10K active users1/3 are Chinese
23K active hosts23K active hosts
7M CPU hoursSince Nov 20127M CPU hoursSince Nov 2012
Hosting 3 applications: Lammps , treeThreader, AevolOther ongoing applications: BOSS (VBoxwrapper based)
Hosting 3 applications: Lammps , treeThreader, AevolOther ongoing applications: BOSS (VBoxwrapper based)
1.3 TFLOPS(real time computing
power)
1.3 TFLOPS(real time computing
power)
Peak: 1M/monthvalidated CPU hours
Peak: 1M/monthvalidated CPU hours
![Page 5: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/5.jpg)
Some project Statistics
![Page 6: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/6.jpg)
Application 1: Lammps
• Software for dynamical molecular simulation, widely used by scientists from various research fields.
• Restartable, developed in C by an international group, can be compiled on both Windows and Linux with some effort.
• Input/output: 3 mandatory input files (<10MB)/ 1 compressed output file (hundreds of MB)
• Running time : 0.5 hour to 800 hours (it depends on a random number which decides the steps of the simulation)
23/4/20 BOINC workshop 2013 @Grenoble 6
![Page 7: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/7.jpg)
Problems• Results are numerical, it generates discrepancy for 2 reasons:– float point calculation on different platforms– the checkpoints also cause discrepancy due to losing
precision with printing the value to a text file. • Solutions – Homogeneous Redundancy, or Homogeneous Application
Version
• Running problems:– Some long jobs (~hundreds hours) crash in the
middle without getting any credit.
23/4/20 BOINC workshop 2013 @Grenoble 7
![Page 8: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/8.jpg)
Application 2: treeThreader
• For Protein structure prediction• Written in C by local scientists, can be compiled easily on both
Windows and Linux platform, restartable• Computing task: to compare a protein sequence file against
all existing protein templates. • Input files: configuration files, Protein Sequence file, ~50k
Protein templates (about 4GB)• Output files: a text file corresponds to a template file• It needs about 42GFLOPS/hour to compare one sequence file
against all templates.
23/4/20 BOINC workshop 2013 @Grenoble 8
![Page 9: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/9.jpg)
Each comparison takes 6s
1 Host 1 Host
Computing task
A Protein sequenceA Protein sequence
Protein Template 1Protein Template 1
Protein Template 2Protein Template 2
Protein Template 3Protein Template 3
Protein Template 50,000
Protein Template 50,000
It takes about 84 hours on a single core
![Page 10: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/10.jpg)
Each comparison takes 6s,each sub package takes 9000s on a
host
Running it on BOINC
A Protein sequenceA Protein sequence
It takes 9000s (2.5 hours) to finish the task
Host A1Host A1
Sub Package 1 (sticky file)Sub Package 1 (sticky file)Protein Template 1500Protein Template 1500
Protein Template 1Protein Template 1Protein Template 2Protein Template 2
Host A2Host A2
Sub Package 2(sticky file)Sub Package 2(sticky file)Protein Template 3000Protein Template 3000
Protein Template 1501Protein Template 1501Protein Template 1502Protein Template 1502
Host AmHost Am
Sub Package 32(sticky file)Sub Package 32(sticky file)Protein Template 48000Protein Template 48000
Protein Template 46501Protein Template 46501Protein Template 46502Protein Template 46502
Host AnHost An
Sub Package 14(sticky file)
Sub Package 14(sticky file)
Sub Package 15(sticky file)Sub Package 15(sticky file)
Sub Package 16(sticky file)
Sub Package 16(sticky file)
Locality Scheduling (job goes to where
the data is)
![Page 11: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/11.jpg)
Problems
• Long tail batches– There is a front end server which submits batches and
does the pre-processing and post processing of the sequence, hence it can only maintain/watch a maximum number of active batches (batches in progress) in parallel (300)
– a whole batch is delayed by the slowest job– No new batches will be submitted to the BOINC server due
to some batches are still “in progress” (waiting for the slowest jobs)
– A lot of hosts end up in “starving” situation
23/4/20 BOINC workshop 2013 @Grenoble 11
![Page 12: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/12.jpg)
Remote Job Submission• CAS@home hosts multiple applications• Each application has multiple users• Application users have no privileges to submit jobs via CAS@home server
directly• It requires remote job submission which allows authorized and
authenticated users to submit jobs through remote machines.• Basic Remote Job Submission functions: batch
submit/check_status/retire/abort/download results • BOINC provides a quite rich set of APIs for remote batch (a set of jobs based
on the same input files) operations, but each application still needs its own server side CGI code and client side code for remote job submission– Some operations (Batch retire/abort/status check) are generic, can directly use BOINC API– Other operations like batch submit/results downloading are application specific, need to be
customized. – Can add fancy functions as “test running”, “estimate running time”
23/4/20 BOINC workshop 2013 @Grenoble 12
![Page 13: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/13.jpg)
Lammps Job Submission• Jobs are created in batches.• A batch = 1 set of input files + different parameter-value pairs• A batch comprises from hundreds to thousands of jobs • Remote Job Submission: Batches are submitted through a
web portal by authenticated and authorized users• Authenticated and Authorized users can “operate” the
batches through the web portal (retire, abort, check status, download results)
23/4/20 BOINC workshop 2013 @Grenoble 13
Batch A –(input file1, input file 2)Job 1: Ka1=Va1 Kb1=Vb1Job 2: Ka2=Va2 Kb2=Vb2…..Job N: KaN=VaN KbN=VbN
![Page 14: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/14.jpg)
LAMMPS
CAS User InterfaceCAS User Interface
File SandboxFile Sandbox
Test a JobTest a Job
Submit a BatchSubmit a Batch
Check Batch StatusCheck Batch Status
Get OutputGet Output
CAS@homeCAS@home
LAMMPS CGI LAMMPS CGI
File Sandbox Service
File Sandbox Service
Job1: Para List , Value List1Job2: Para List , Value List2Job3: Para List , Value List3….JobN: Para List , Value ListN
Job1: Para List , Value List1Job2: Para List , Value List2Job3: Para List , Value List3….JobN: Para List , Value ListN ……
![Page 15: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/15.jpg)
Syntax check, GLOPS, output size estimationSyntax check, GLOPS, output size estimation
http
http
http
http
http
http
http
Web PortalWeb Portal
http
Pass the testPass the test
23/4/20 BOINC workshop 2013 @Grenoble 15
Sandbox SandboxFile1File2File1File2
LAMMPS CGI on CAS@home serverLAMMPS CGI on
CAS@home server
Job TesterJob Tester
Batch CreatorBatch Creator
Batch MonitorJob Monitor
Batch MonitorJob Monitor
Operations on BatchOperations on Batch
Abort/Retire a batchAbort/Retire a batch
Download ResultsDownload Results
Batch OperationsBatch Operations
Zip ResultsZip Results
Volunteer Hosts
Volunteer Hosts
Volunteer Hosts
Volunteer Hosts
UserUser
Test a job with chosen input files
Test a job with chosen input files
Submit a batchSubmit a batch
http
http
![Page 16: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/16.jpg)
BOINC Sandbox
23/4/20 BOINC workshop 2013 @Grenoble 16
Can not repeat uploading a file
Can not delete files used by a running batch
![Page 17: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/17.jpg)
Lammps Job Testing
23/4/20 BOINC workshop 2013 @Grenoble 17
Test the job to the server
Submit the batch
Lammps Specific !Lammps Specific !
![Page 18: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/18.jpg)
Batch Monitoring
23/4/20 BOINC workshop 2013 @Grenoble 18
Admin can see the status of all batches
Batch status: In process, Completed, Aborted, Retired
![Page 19: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/19.jpg)
Admin all batches
23/4/20 BOINC workshop 2013 @Grenoble 19
![Page 20: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/20.jpg)
Job Status
23/4/20 BOINC workshop 2013 @Grenoble 20
Input files associated with this job
Results can be downloaded respectively
![Page 21: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/21.jpg)
Batch Operations
23/4/20 BOINC workshop 2013 @Grenoble 21
Download results of this batch
Retire a batch
Download results of a work unit
Can Abort an unfinished batch
here
![Page 22: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/22.jpg)
TreeThreader job submission• Jobs are created in batches: 1 protein sequence
corresponds to 1 batch (32 jobs)• Remote Job Submission: – Client side: provide a set of PHP APIs which allows
authenticated and authorized users to submit batches and operate (check status, retire, abort, get output)these batches from remote
– Server side:• Generic operations such as batch abort/retire/status check are already
included in BOINC code• Operations as batch submission and results downloading are application
specific, and implemented in a CGI program on the server side
23/4/20 BOINC workshop 2013 @Grenoble 22
![Page 23: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/23.jpg)
TreeThreader Job Submission CGI• Batch submission
– Takes client uploaded the sequence and configuration files– create a batch of jobs based on the input files and all templates files which
are already stored on the server side.– Return a Batch ID
• Batch result downloading– uncompress all output files of the batch– put uncompressed output files into a same directory and compress it– return the downloading URL of the batch result file
23/4/20 BOINC workshop 2013 @Grenoble 23
![Page 24: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/24.jpg)
TreeThreader Job Submission
TreeThreader CGI CAS@home
TreeThreader CGI CAS@home
Template P1Template P1
Template P2Template P2
Template P3Template P3
Template P32Template P32……
……
Template P4Template P4
ICT Web ServicesICT Web Services
APIAPI
Submit a sequenceSubmit a sequence
Status CheckStatus Check
Get OutputGet Output
SequenceM
erged Results
![Page 25: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/25.jpg)
Thoughts on a more generic Job submission interface
• Server side still requires specific functions to create batches, merge results, testing, estimation
• On client side, can generalize the job submission and results downloading functions
• Use an XML file to describe input files, types of input files from the client side
23/4/20 BOINC workshop 2013 @Grenoble 25
![Page 26: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/26.jpg)
23/4/20 BOINC workshop 2013 @Grenoble 26
<jobdesc> <file info> <number> 0 </number> <type>upload</type> !file needs to be uploaded to BOINC server </file info> <file info> <number> 1 </number> <type>online</type> !file already stored on BOINC server </file info> <file_ref> <file_number>0</file_number> <open_name>MySEQ.tar.gz</open_name> </file_ref> <file_ref> <file_number>1</file_number> <open_name>Templates</open_name> </file_ref></jobdesc>
![Page 27: CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing 2015-10-17 BOINC workshop 2013](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f175503460f94c2de57/html5/thumbnails/27.jpg)
The End!
23/4/20 BOINC workshop 2013 @Grenoble 27