rmar kd ow n

25
Rmarkdown Towards reproducibility Roland Krause | rworkshop | 2021-11-11

Upload: others

Post on 02-Mar-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

RmarkdownTowards reproducibility

Roland Krause | rworkshop | 2021-11-11

Learning objectives

You will learn to:

Use the markdown syntaxCreate Rmarkdown documentsDe�ne the output format you expect to renderUse the interactive RStudio interface to

Create your documentsInsert R codeInsert bibliographyBuild your �nal document

2/25

Typical �ow of data

Source data ➡

Experimental dataExternal data setsManually collected dataand meta data

Intermediate ➡

Derived dataComputatationManual curationTidy data

Analysis ➡

Exploratory analysisStatistical modelsHypothesis testing

Manuscript ➡

Can you reproduce yourwork?

All numbersSummariesImages

One work�ow

No editing of data at any stepAll code needed to reproduce from one ingestions to manuscript coded and repeatable �

3/25

Rmarkdown

Credit: Artwork by Allison Horst

4/25

Rmarkdown

Why using Rmarkdown?

Write detailed reportsEnsure reproducibilityKeep track of your analysesComment/describe each step of your analysisExport a single (Rmd) document to various formats (PDF, HTML...)Text �le that can be managed by a version control system (like git)

Rmarkdown: + +

5/25

MarkdownMarkdown is used to format text.

Markup language

Such as XML, HTMLA coding system used to structure textUses markup tags (e.g. <h1></h1> in HTML)

HTML

Lightweight markup language

Easy to read and write as it uses simple tags (e.g. #)

MD example

<!DOCTYPE html><html><body>

<h1>This is a heading</h1>

<p>This is some text in a paragraph.</p>

</body></html>

# This is a heading

This is some text in a paragraph

6/25

Common text formatting tags

Headers

6 levels are de�ned using #, ##, ### ...From BIG to small

Text style

bold (**This will be bold**)italic (*This will be italic*)

Links and images

http://example.com is auto-linked[description](http://example.com)![](path/to/image.jpg)![desc](path/to/image.jpg) for alternativedescription

Verbatim code

code (`coding stuff` )Triple backticks are delimiting code blocks:

rendered as:

```This is *verbatim* code# Even headers are not interpreted```

This is *verbatim* code# Even headers are not interpreted

7/25

Including code for Rmarkdown

Rmarkdown

Extends markdownPlace R code in chunksChunks will be evaluatedCan also handle bash; python; css; ...

Knitr

Extracts R chunksInterprets themFormats results as markdownReintegrates them into the maindocument (md)

Pandoc

pandoc converts markdown to thedesired document (Pdf, HTML, ...)

8/25

Rmd creation: step 1

9/25

Rmd creation: step 2

10/25

Rmd creation: step 3Generate your �rst HTML �le

Use the knit button in RStudio

11/25

YAML header

To de�ne document wide optionstitle, name, ...

Markdown

Markdown syntax to write yourdescriptions, remarksLiterate programming

Chunks

Code to be interpreted by R

Rmarkdown document: Structure

12/25

Insert a chunk

shortcut: CTRL + Alt + I:

Basic Markup

Delimited by triple backticks tags (``` )Options in curly brackets

engine evaluating the code but also python, bash, ...```{r} is the minimum to de�ne a starting chunkname of chunk (optional)show or hide the source code (echo = TRUE)evaluate it or not (eval = FALSE)�gure size (inches) ...

R code chunks

13/25

Navigation through chunk namesChunk names allow you to quickly navigate code, automatically name �gures, and troubleshoot errors.

Chunk names must be unique. By default a numbered chunk name will be assigned

14/25

Inline R code

Integrate small pieces of R code

Use backticks (` ) followed by the keyword r:

`r <your R code>`

Example

Type in 1 + 1 = `r 1 + 1`

renders as 1 + 1 = 2 .

More useful: `r nrow(swiss)`

renders as 47.

15/25

Popular output formats

HTML

Fast renderingNo need for extra installBy default embeds binaries (pictures,libraries etc.)

PDF

Single �leRequires , have a look at theTinyTeX package for minimal install

Word

Widely usedEasily editableCollaborate with people not usingRmarkdownPrepare scienti�c manuscriptssuitable for submission

LT XA E

16/25

Special chunk named setup

To load packages / functions silently before any other chunk executions

Stop the knitting process gracefully

Do not knit at a speci�c location in your Rmarkdown document fordebugging / fast rendering. Here optionally named early_stop

We like to avoid cache = TRUE

We will see how targets manages dependencies and re-rerun onlyoutdated steps

Working in RMarkdown

```{r setup, include = FALSE}

library(tidyverse)source("R/helpers.R")

```

```{r early_stop}

knitr::knit_exit()

```

```{r caching, cache = TRUE}

# big computations......

```

17/25

Basic tables created from scratch

Dataset # rows Description

swiss 47 Swiss Fertility and Socioeconomic Indicators

datasaurus 1846 Example

Note that numbers have to be manually aligned.

Tables from tibbles withknitr::kable()

Data from tibbles can be formatted for printing with paging etc.

Table: Swiss Fertility and Socioeconomic Indicators (1888)

Province Fertility Agriculture Catholic

Courtelary 80.2 17.0 9.96

Delemont 83.1 45.1 84.84

Franches-Mnt 92.5 39.7 93.40

Moutier 85.8 36.5 33.77

Neuveville 76.9 43.5 5.16

Basic tables

Dataset | # rows| Description---------|-----:|------swiss | 47 | Swiss Indicatorsdatasaurus| `r nrow(datasaurus_dozen)` | Example

sw_db <- as_tibble(swiss, rownames = "Province") %>% select(Province:Catholic, -starts_with("E")) %>% slice_head(n = 5)

swiss_cap <- "Swiss Fertility and Socioeconomic Indicators (188knitr::kable(sw_db, caption = swiss_cap)

18/25

Many table packages

The knitr::kable() functionality is su�cientfor explorative analysis; for production, you wantmore control over tables.

huxtable�extablestargazerpandertablesgt

Not all packages do everything well and the �eld israpidly moving.

Check the blog post of David Hugh-Jones, creator ofHuxtable for a survey and functionalities.

Huxtable

Province Fertility Agriculture Catholic

Courtelary 80.2 17   9.96

Delemont 83.1 45.1 84.8 

Franches-Mnt

92.5 39.7 93.4 

Moutier 85.8 36.5 33.8 

Neuveville 76.9 43.5 5.16

hux t

able

Beautiful tables in Rmarkdown

library(huxtable)hux(sw_db) %>% set_bold(row = 1, col = everywhere) %>% set_width(0.6) %>% set_background_color(evens, everywhere, "lightgreen")

19/25

Scienti�c publishing with Markdown

20/25

How to add equations

Enclose in $ for in-line equations

$a^2+b^2=c^2$ renders as .

Double ($$) for separate equations.

yields

No need for direct interaction with , pandoc istaking care.Numbering of equations is possible but requires bookdownpages and PDF output.

Tables

Complex arrangements can be used asalternatives for builtin tables.

Potential pitfalls

MathJax needs to load, unstable internet connections caninterveneThe Markdown/ mix is sensitive to spaces.

Equations using syntaxLT XA E

a2 + b2 = c2

$$G_{\mu v}=8 \pi G (T_{\mu v} + \rho _\Lambda \ g_

Gμv = 8πG(Tμv + ρΛ gμv)

LT XAE

LT XA E

$$\begin{array}{ccc}x_{11} & x_{12} & x_{13}\\x_{21} & x_{22} & x_{23}\end{array}$$

x11 x12 x13

x21 x22 x23

LT XA E

21/25

Bibliography

Supported formats

Use it with your EndNote or Zotero database:BibLaTeX, BibTeX, EndNote, EndNote XML, MEDLINE, ISI, MODS,RIS, Copac, JSON citeproc

Styles

uses citation style language (csl) �leshave a look at:

https://www.zotero.org/styleshttps://github.com/citation-style-language/styles

22/25

Con�guring your bibliography

Setup in the yaml headerInsert citations using the pandoc syntax: [@citation-key]

Using Zotero as a reference manager

Try the Better Bib(La)TeX plugin and adjust morepreferences

Article templatesThe rticles package creates article templates for journals.

Bibliography

---title: "Sample Document"output: html_documentbibliography: bibliography.bibcsl: nature.csl---

Barnacles are boring[@darwin].

23/25

Lessons learned

Experience

Two manuscripts published, computedrendered using Rmarkdown

Initially, we kept text and analysiscode togetherHard to organize - abstract alreadycontains conclusionsEventually, all code was one bigchunk in the Rmarkdown doc

Our standard setup

Code follows the data life cycle, e.g. usingscripts to

1. Import2. Transform3. Model

Controlled by another script (e.g. usingMake, or runner script)

Proposed organization

Explore your dataset in the context ofan Rmd documentMove production code to a script, e.g.in a directory called R

Better: use thetargetsthat extends theconcept of reproducible work�ows.

24/25

You learned to:

What is Rmarkdown (Rmd)Basic syntax of Markdownknit your Rmd to different output formatsStyling tablesBibliography integration

Further reading �

Rmarkdown, the de�nitive bookRmarkdown website

Thank you for yourattention!

o

Acknowledgments � �

Xie Yihuie (Creator)Alison HillHadley WickhamArtwork by Allison HorstJenny Bryan

Contributors

Eric Koncina (initial content)Veronica Codoni (major overhaul)Roland Krause ( (\LaTeX), bibliography)Aurélien Ginolhac (Chunk options)

Before we stop

25/25