example applications of e-infrastructure: ngs ui/wms jonathan churchill - stfc/ral...
TRANSCRIPT
Summary• Overview• Example case study.
– Logging in: SSH and MyProxy.– Parametric jobs– Head node for the grid
• Submitting simple jobs.• Submitting ‘real’ jobs.• Lab Session.
NGS UI/WMS• User Interface / Workload Management System• UMD (gLite) UI and WMS Distributions• NGS Useability improvements:
– ssh / proxy logins– Extensive proxy checking– Gridftp on UI
• “Head node for the Grid”
• Works with NGS and gLite/GridPP sites• 12,000 + CPU Cores
Rapid take up sinceOct ‘09 startup.
NGS WMSUI
WMS
Information Server (BDII)
MyProxy
RAL-NGS2,Scotgrid-Glasgow, Oxford-OERC,Manchester-NGS2RAL-LCG....etc 12,000+ CPUs
<gsiFTP>
ssh Login : <MEG>
<Soap WS>
Transcriptome Analysis using the NGS UI/WMS
Jonathan Churchill - STFC/[email protected]
Paul Wilkinson - Exeter University
Tobacco Hornworm Moth Manduca sexta
Green Dock Beetle Gastrophysa viridula
mRNA Transcript
genome.gov: National Human Genome Research Institute
Bio Databases and Applications
ftp
rsync
• Database Mirrors:EMBL
UNIPROT, TREMBL, SWISSPROTPROSITEPRINTSREBASE
• Pre-Installed Applications:BLAST, EMBOSS, FASTAGROMACS, MrBAYES, EXONERATE, NAMD, Siesta
WMS Parametric Case Study• What Proteins do these ‘contigs’/ transcripts
code for ?
• NCBI BLAST Search in the EBI Uniprot database
1 x 55,000 Contigs 1 month elapsed annotation time.
1000 x 55 Contigs + NGS + WMS < 6 hours elapsed annotation time. Using WMS ‘Parametric’ JDL file one JDL for 1000 Jobs one Submission Command One Status Command Outputs returned to UI Automatically
Create your own username / password
Login : SSH/PuTTY
JDL FileType = "Job";JobType = "Parametric";Executable = "/usr/ngs/BLAST-NCBI";Arguments = "blastall -p blastx -d uniprot -i contig-_PARAM_.fsa -a 1";StdOutput = "contig-_PARAM_.out";StdError = "contig-_PARAM_.err";Parameters = 997;ParameterStart = 0;ParameterStep = 1;MyProxyServer = "myproxy.ngs.ac.uk";InputSandbox = {"contig-_PARAM_.fsa"};InputSandboxBaseURI = "gsiftp://ngsui03.ngs.ac.uk:2811/home/ngs0055/ParamBlast/inputs";OutputSandbox = {"contig-_PARAM_.out","contig-_PARAM_.err"};OutputSandboxBaseDestURI =
"gsiftp://ngsui03.ngs.ac.uk:2811/home/ngs0055/ParamBlast/outputs";Requirements = ( Member("NGS-UEE-BLAST-NCBI", other.GlueHostApplicationSoftwareRunTimeEnvironment));Rank = other.GlueCEStateFreeCPUs;ShallowRetryCount = -1;
Submit & Monitor
• glite-wms-job-submit –a –o jobIDs blast.jdl• glite-wms-job-status –i jobIDs• One jobID for all 1000 jobs• 1000 Output files• IP & OP files copied from/to UI• Jobs 2-3 hours each• Head node for the grid.
Peak 320 Jobs in flightAvg 150 Jobs in flight
Summary
• 50,000 Contig analysis in < 6 hours vs 1 Month• ssh Username/password Logins• 1000 Jobs all managed as one ‘job’.• Input/Output on the UI.• Head node for the Grid.
Summary• Overview• Example case study.
– Logging in: SSH and MyProxy.– Parametric jobs– Head node for the grid
• Submitting simple jobs.• Submitting ‘real’ jobs.• Lab Session.
Simple Example• [ Create your proxy <- UI does this for you ! ]
– voms-proxy-init --voms ngs.ac.uk
• See what’s availablelcg-infosites --vo ngs.ac.uk ce
• Submit the jobglite-job-submit -a –o jobIDs.txt my_test.jdlhttps://ngswms01.ngs.ac.uk:9000/LHGIagvDl701_msz0jpIg
• Check the status of your jobglite-job-status -i jobIDs.txt
• Get the outputglite-job-output –i jobIds.txt --dir ./outputs
Note: UI/WMS can retrieve outputs automatically
• Simple JDL file
• Some default parameters set on the UI
Simple Example jdl
Type = "Job";JobType = "Normal";Executable = “settings.sh";StdOutput = “output.out";StdError = “output.err";InputSandBox = {“settings.sh”};OutputSandbox = {“output.err",“output.out"};RetryCount = 1;Requirements = ( other.GlueCEUniqueID == "ngs.rl.ac.uk:2119/jobmanager-lsf-ngs“); Rank = other.GlueCEStateFreeCPUs;
Requirements = other.GlueCEStateStatus == “Production”;
Summary• Overview• Example case study.
– Logging in: SSH and MyProxy.– Parametric jobs– Head node for the grid
• Submitting simple jobs.• Submitting ‘real’ jobs.• Lab Session.
NGS Applications Docs
Input/Output Files• InputSandBox lists all input files
– Inc’s binaries/scripts to run– Wildcards ok
• OutputSandBox lists o/p files to retrieve.– Wildcards not allowed.– Tutorial shows ‘Epilogue’ script.
• InputSandboxBaseURI– Avoids 3rd party transfers via WMS
server.• OutputSandBoxBaseDestURI
– O/P’s to UI or elsewhere.– Output dir must exist.– Files arrive before job “Done”.
Type = "Job";JobType = "mpich";Executable = "/usr/ngs/DLPOLY2"; CpuNumber = 8; StdOutput = "std.out"; StdError = "std.err"; Myproxyserver= "myproxy.ngs.ac.uk"; InputSandbox = {"CONFIG","CONTROL","FIELD","REVCON"};InputSandboxBaseURI = "gsiftp://ngsui03.ngs.ac.uk:2811/home/ngsxxx/dlpoly"; OutputSandbox = {"OUTPUT","STATIS","CONFIG", "CONTROL","FIELD","REVCON","REVIVE", "stdout.out","stderr.out"};OutputSandboxBaseDestURI = "gsiftp://ngsui03.ngs.ac.uk:2811/home/ngsxxx/dlpoly"; Requirements = ( member("NGS-UEE-DLPOLY2", other.GlueHostApplicationSoftwareRunTimeEnvironment));ShallowRetryCount = -1;
Key Features Summary
• ssh Logins• Input/Output on the UI• Head node for the Grid.• Single Jobs and Parametric Sweeps• Normal and MPI jobs• Example JDLs on wwww.ngs.ac.uk• Questions : [email protected]
Further Information• NGS Web site UI-WMS Page:
– http://www.ngs.ac.uk/uiwms– Links to simple WMS Tutorials (2) & app specific (Gaussian, NAMD etc)– http://www.ngs.ac.uk/applications
• Tutorials– http://www.ngs.ac.uk/ngs-workload-management-system-and-user-interface-tutorials– http://wiki.ngs.ac.uk/index.php?title=UI-WMS_Tutorial– http://wiki.ngs.ac.uk/index.php?title=UI-WMS_Tutorial2
• Parametric Case Study:– http://www.ngs.ac.uk/mrna-analysis-using-the-ngs– http://www.ngs.ac.uk/sites/default/files/file/newsletters/Dec%202009%20NGS%20news.pdf
• Links to Guides and JDL attributes doc.– http://wiki.ngs.ac.uk/index.php?title=UI-WMS_Tutorial#Further_Resources
Questions ?
Lab
• http://wiki.ngs.ac.uk/index.php?title=UI-WMS-SeIUCCR-Tutorial
• Login:– SSH username = “SummerUserXX”
• XX is on your packs
– SSH Password = “2012-SSXX”– Valid until Friday afternoon
Inputs>whitefly_assembly.accurate.15_lrc1GGTATCAACGCAGAGTKCGCGGGGAGTAGAACAAAGAGCGTCTGAGAGGACTTCGCGATAGTGTTACGTTAATCGATAGCTCGTGTGTTAAAAAAATCTTTCAAGTCCTTCCTGTCTTTTGACTACTTAATTAGTTAATTATTATTTTGATCGAGACAAGCAAAGAAAAATGAATTCCATATTATCTTTGACCGTTTTCGTAACTTTCACAATTGTCTTGGCTCAAAGTGAACAATTAGACAAGAACTTCGGCGTGGGCGAAATCAAGACTCGCATCCAAGATAAAAAATTTGTTGAGAAGCAGTTGGGCTGTGTCCTAGGGAAAGCCGATTGCGACACCTTAGGAAATCAGTTGAAAGTTGCCATTCCAGAAGTCCTAGTTAAAGGCTGCAAGGATTGCACTCCGGAACAATCTGCAAATGCCAATCGATTAATAGCTTTTATAAAGATGAATTATCCAGCAGAATGGAGTCAAATTGCTGCAAAATATGGTGTGAAAGGTGATGCTGTAAAGAGGCCACGACGACATATCAGAAGGTGAAAGGAGTGATGCCAAAGATGTGATAAGTTTTTATTGTTAACTTTCGAGTCTTGACTTGATTTGATCATTGTGTACGTATGTATTTTAATTCTTCCAATTGTGAGCAGTATTTTAAGAGGGTATTCTAAATAACAGCCGTCCAAAAAGTTTTGAACTGAAATTTAAACTGTTAAGTGTTGATGACTTTTACCAATATTTATTTTTTTATCACCGAACTGTTAGTAATACTGCGACCAATACAAATTTATCTTTAGTCAGCTTGATTTTTTATCAAGTTGATTCTTTTTTTTGGACAATTTTTTTTTTATTATTATTCTTCCTCATTTAATGTATGTTTAAAATTGTTAATTGACCACCATTCGCATTTAATTGATTAAGTTTTTCTTATTTTTTTTTTATATGAACCAATGTTATAATTTTGCTCTCATAAACCTACTGTAAAATATTGAGTGTCCAGTTAAAGCTTTAAACTTTATATATTTTAACAAAAAATTAATGAGCTATTTTATAGAACCTAATAA>whitefly_assembly.accurate.15_lrc2TCGGGGGAGTAAATTCATGAAAGATAATCTAATCGTGCAGCCTTTTTATGAGACGCGCTGAAGTTTCGGATTAGGTTTTAGTCTTTACTAATTAATTGTATTTGTTTAGCTCATTAATTTTAATTATTCCACATTTAAAGATGTCTAAGGAAGAAGCAGCAATCCCTCCTCCAATGATTTGGGCCCAGAGATCTGGTGTTGTCTTTTTAACAATTAATGTAGAGGATTGTAAAGACCCCGAAATTAAAATTGAAGAAGATAAATTTTCTTTTAAAAGTGTTGGTGGTGTTGAAAAGAAGAAATATGAAGTCACAGTAAATCTATTTAAAGAAATAGACCCAGAAAAATCTGTAAAACATGTTCGCGAACGACACATTGAGTTAGTCCTAAAAAAGAAAGAAGACAAAGCTCCTTACTGGCCACAATTGACGAAAGAAAAGACTAAGCACCATTGGTTAAAAGTGGATTTCAATAAGTGGAAGGATGAAGATGATAGCGAAGATGAAGCCGAAGGACAAGACTCAGATTTTGGTGATCTAATGCGGTCGATGGGTCAAGGAGGCGGTATGGGTGGTATGGGCGGTATGGGTGGTATGGGAGGAATGGGAGGTATGGGTATGGGCGGTATGGGTGGTATGGGAATGGGTGGTTTAGGTGACAAGCCCTCTTTCGAAGGAATGGAAGAAGAAGATTCGGACGACGAAGATTTGCCCGACCTCGAAGAGTAATAGTGTTTTTATTACACCATATTCCATTTCCCTGTTATTGCATAAGGCCTCAGAAGAAGATGAAAAAATTGAAGCTATGAACGGACAGTCAAATCGATCACGCAGTTCACTG
• ....55,000 more contigs• Split up into ~1000 files
of ~55 contigs each.• Custom perl script or
bioperl routines.• contig-0.fsa ...
config-997.fsa
NGS WMSngsui03.ngs.ac.uk
ngswms01.ngs.ac.uk
bdii.ngs.ac.ukuk
myproxy.ngs.ac.uk
RAL-NGS2,Scotgrid-Glasgow, Oxford-OERC,Manchester-NGS2RAL-LCG....etc 12,000+ CPUs
gridftp
ssh Login MEG
Job types
• Single Job– Normal: simple batch job– MPICH: parallel jobs– Interactive: o/p streamed back to the client
• Parametric– Set of similar jobs whose jdl attributes are
parameterised• Collections
– Group of jobs without dependencies• DAG
– Group of dependent jobs