persistent reproducible reporting - nan xiaodocker allows applications and their dependencies to be...
TRANSCRIPT
![Page 1: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/1.jpg)
© 2017 Seven Bridges
Persistent Reproducible Reporting
Nan Xiao, Seven Bridges
2017/05/20 @ China R Conference Beijing
![Page 2: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/2.jpg)
© 2017 Seven Bridges
DOCUMENT-LEVEL REPRODUCIBILITY
![Page 3: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/3.jpg)
sevenbridges.com© 2017 Seven Bridges
R MARKDOWN + KNITR TO THE RESCUE
knitr
+
![Page 4: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/4.jpg)
sevenbridges.com© 2017 Seven Bridges
… has always been a concern in both academia & industry.
REPRODUCIBILITY
![Page 5: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/5.jpg)
sevenbridges.com© 2017 Seven Bridges
▪ www.cancergenomicscloud.org ▪ Hundreds of automated analysis workflows for petabyte-scale data from The Cancer Genome Atlas
CANCER GENOMICS CLOUD (CGC)
![Page 6: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/6.jpg)
sevenbridges.com© 2017 Seven Bridges
PRODUCT & ENGINEERING INNOVATIONS IN CGC
Rabix
![Page 7: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/7.jpg)
sevenbridges.com© 2017 Seven Bridges
How to ensure your reports are reproducible across time and environments, when the data, analysis tools, operating systems are all evolving?
CHALLENGE: OS-LEVEL REPRODUCIBILITY
![Page 8: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/8.jpg)
sevenbridges.com© 2017 Seven Bridges
▪ Docker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged in this way can be run from many diverse infrastructures.
DOCKER
![Page 9: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/9.jpg)
sevenbridges.com© 2017 Seven Bridges
OS-level reproducibility & persistency for reports.
liftr
knitrliftr+ =+
![Page 10: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/10.jpg)
sevenbridges.com© 2017 Seven Bridges
---liftr: sysdeps: - gfortran cran: - glmnet - xgboost---
YAML
R Markdown Documentswith liftr options in metadata
DockerfileRendered HTML/PDF/Docx Reports
+ .docker.yml
lift("foo.Rmd") render_docker("foo.Rmd")
By running render_docker(), liftr will build the Docker image, run the container, and render the R Markdown document.
By running lift() on the RMD file, liftr parses the metadata fields appeared in the R Markdown document; then generates the Dockerfile.
Containerized Report
liftr extends the R Markdown metadata format, introducing additional options for containerizing and rendering reports.
+ PDF +
DOCKERIZE DOCUMENTS AS EASY AS 1-2-3
sevenbridges.com© 2017 Seven Bridges
![Page 11: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/11.jpg)
sevenbridges.com© 2017 Seven Bridges
DOCKERIZE DOCUMENTS AS EASY AS 1-2-3
library("liftr")input = "demo.Rmd"
lift(input) # Generate Dockerfilerender_docker(input) # Render report with Docker
purge_image(input) # Clean up Docker imagepush_image(input) # Push image to registry (devel)
![Page 12: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/12.jpg)
sevenbridges.com© 2017 Seven Bridges
DEMO: RNA-SEQ DATA ANALYSISExample workflow from Bioconductor.org
▪ RNA-Seq: biotechnology for measuring the expression of genes. It can help identify potential key genes in cancer.
▪ TBs of RNA-Seq data are generated. Computational tools and workflows are developed to analyze such data.
▪We need to ensure such reports are reproducible through time, when datasets, analysis tools are both evolving.
▪ Code available from: bit.ly/liftrdemo
![Page 13: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/13.jpg)
sevenbridges.com© 2017 Seven Bridges
COMPLEXITY IN DEPENDENCY
![Page 14: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/14.jpg)
sevenbridges.com© 2017 Seven Bridges
Add liftr metadata to the R Markdown document: ▪ Base image ▪ System dependencies ▪ Package dependencies ▪ …
STEP 1
![Page 15: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/15.jpg)
sevenbridges.com© 2017 Seven Bridges
Generate Dockerfile with liftr::lift
STEP 2
![Page 16: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/16.jpg)
sevenbridges.com© 2017 Seven Bridges
▪ liftr::render_docker will build the Docker image, run the container, and render into PDF/HTML/DOCX.
▪ Re-compile: cached Docker image layers are used to improve speed.
▪ Remove the used image, or push to Docker registry.
STEP 3
![Page 17: Persistent Reproducible Reporting - Nan XiaoDocker allows applications and their dependencies to be packaged into discrete runtime environments, called containers. Applications packaged](https://reader035.vdocuments.mx/reader035/viewer/2022081403/609dba9d4fa36d66e85ec086/html5/thumbnails/17.jpg)
sevenbridges.com© 2017 Seven Bridges
We aim to expand the R Markdown tool chain by exploring the next frontier: system-level reproducibility, and democratize reproducible report creation/sharing.
To achieve this, we need:
▪ Standard renderers + independent YAML configuration file ▪ Better IDE support (RStudio Addins) ▪ Better on-boarding experience: automatic dependency parsing ▪ Cloud-based rendering and containerization services for dynamic documents
FUTURE WORKS