getting started with wdl & cromwell - blue waters...getting started with wdl & cromwell...
TRANSCRIPT
![Page 1: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/1.jpg)
Getting started with WDL & CromwellBioinformatics workflows at any scale
Ruchi MunshiData Sciences Platform
Broad Institute
![Page 2: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/2.jpg)
The backdrop: data generation set to explode
Story begins here
Quarterly output (in TBases) of the Genomics Platform
![Page 3: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/3.jpg)
• Execution engine that can – Run on any platform (HPC and on Cloud)– Seamlessly scale based on workflow needs– Provide maximal flexibility for all use cases– https://github.com/broadinstitute/cromwell
• Workflow language that humans can read/write– Methods developers and biomedical scientists at large– https://github.com/openwdl/wdl/
Meet Cromwell & WDL
![Page 4: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/4.jpg)
Use containers for portability & reproducibility
A container encapsulates all the software dependencies associated with running a program
Takes the guesswork out of running workflows on different platforms!
GATK 2.8
Java 7
R 2.5.0
GATK 3.8
Java 8
R 3.0.1 BWA
Picard
Modified from https://www.docker.com/what-container
![Page 5: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/5.jpg)
Use a workflow execution engine that runs anywhere
Cromwell
…
HPC TESLocal Google
Funnel
https://github.com/broadinstitute/cromwell
AWS Alicloud
![Page 6: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/6.jpg)
Run using HPC and Cloud resources!
Databuckets
Compute environment
Cloud
Persistent Cromwell server
HPC
Local
![Page 7: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/7.jpg)
Command Line• Simple, self-contained command• Appropriate for independent
analysts• Call Caching
Server• API endpoints• More scalable, appropriate for
production environments• Call caching
Two main ways to run Cromwell
java -jar cromwell.jar \run hello.wdl \hello_inputs.json
![Page 8: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/8.jpg)
Managing data- Root directory- Data handling strategies- Support for object stores
HPC Backend Configuration
https://cromwell.readthedocs.io/en/stable/tutorials/HPCIntro/
![Page 9: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/9.jpg)
Managing resources- CPU- Memory- Custom attributes
HPC Backend Configuration
https://cromwell.readthedocs.io/en/stable/tutorials/HPCIntro/
![Page 10: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/10.jpg)
Run command- Built-in variables- Full flexibility
HPC Backend Configuration
https://cromwell.readthedocs.io/en/stable/tutorials/HPCIntro/
![Page 11: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/11.jpg)
Plenty of workflow solutions to go around
Randall Munroe, XKCDhttps://www.xkcd.com/927/
So of course we decided to create a new one.
![Page 12: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/12.jpg)
Workflow description Language (WDL)
![Page 13: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/13.jpg)
WDL runtime parameters
resourcing
cost savings!
containers
![Page 14: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/14.jpg)
Basic WDL plumbing options
call stepAcall stepB { input: in=stepA.out }call stepC { input: in=stepB.out }
LINEAR CHAINING
MULTI-IN/OUT
call stepC { input : in1=stepB.out1, in2=stepB.out2 }
Array[File] inputFiles
scatter(oneFile in inputFiles) {call stepA { input: in=oneFile }
}
call stepB { input: files=stepA.out }
SCATTER-GATHER
![Page 15: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/15.jpg)
But what about CWL?
Randall Munroe, XKCDhttps://www.xkcd.com/1739/
Thanks to our Workflow Object Model(WOM), Cromwell now supports multiple versions of WDL as well as CWL 1.0!
![Page 16: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/16.jpg)
Cromwell has been busy
Cromwell in production at Broad:
Processed 47.5 million jobsover the last two years
And this is just the tip of the iceberg!
![Page 17: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute](https://reader030.vdocuments.mx/reader030/viewer/2022040409/5ec7e7389b761d7a4112ad19/html5/thumbnails/17.jpg)
Want to discuss further?
My Email: [email protected]
More Information:Docs: http://cromwell.readthedocs.io/en/develop/Github: https://www.github.com/broadinstitute/cromwellWDL: http://www.openwdl.org
Example Pipelines: https://github.com/gatk-workflows